Jupyter Notebook
PymimidBook Last Checkpoint: 6 hours ago (autosaved) Current Kernel Logo

Python 3

Trusted
  • File
    • New Notebook
      • Python 3
    • Open...
    • Make a Copy...
    • Save as...
    • Rename...
    • Save and Checkpoint
    • Revert to Checkpoint
      • Friday, August 16, 2019 4:38 AM
    • Print Preview
    • Download as
      • Notebook (.ipynb)
      • Python (.py)
      • HTML (.html)
      • Reveal.js slides (.html)
      • Markdown (.md)
      • reST (.rst)
      • LaTeX (.tex)
      • PDF via LaTeX (.pdf)
      • asciidoc (.asciidoc)
      • html (.html)
      • html (.html)
      • html (.html)
      • html (.html)
      • html (.html)
      • html (.html)
      • latex (.tex)
      • latex (.tex)
      • markdown (.md)
      • notebook (.ipynb)
      • pdf (.pdf)
      • python (.py)
      • rst (.rst)
      • script (.txt)
      • notebook (.ipynb)
      • slides (.slides.html)
      • slides (.slides.html)
    • Deploy as
      • Jupytext
        • Jupytext reference
        • Autosave notebook
        • Pair Notebook with light Script
        • Pair Notebook with percent Script
        • Pair Notebook with Hydrogen Script
        • Pair Notebook with Markdown
        • Pair Notebook with R Markdown
        • Custom pairing
        • Unpair notebook
      • Trusted Notebook
      • Close and Halt
    • Edit
      • Cut Cells
      • Copy Cells
      • Paste Cells Above
      • Paste Cells Below
      • Paste Cells & Replace
      • Delete Cells
      • Undo Delete Cells
      • Split Cell
      • Merge Cell Above
      • Merge Cell Below
      • Move Cell Up
      • Move Cell Down
      • Keymaps
        • default
        • emacs
        • vim
        • sublime
      • Edit Notebook Metadata
      • Find and Replace
      • Cut Cell Attachments
      • Copy Cell Attachments
      • Paste Cell Attachments
      • Insert Image
      • nbextensions config
    • View
      • Toggle Header
      • Toggle Toolbar
      • Toggle Line Numbers
      • Cell Toolbar
        • None
        • Edit Metadata
        • Raw Cell Format
        • Slideshow
        • Attachments
        • Tags
    • Insert
      • Insert Cell Above
      • Insert Cell Below
      • Insert Heading Above
      • Insert Heading Below
    • Cell
      • Run Cells
      • Run Cells and Select Below
      • Run Cells and Insert Below
      • Run All
      • Run All Above
      • Run All Below
      • Cell Type
        • Code
        • Markdown
        • Raw NBConvert
      • Current Outputs
        • Toggle
        • Toggle Scrolling
        • Clear
      • All Output
        • Toggle
        • Toggle Scrolling
        • Clear
      • Execution Timings
        • Toggle visibility (selected)
        • Toggle visibility (all)
        • Clear (selected)
        • Clear (all)
    • Kernel
      • Interrupt
      • Restart
      • Restart & Clear Output
      • Restart & Run All
      • Reconnect
      • Shutdown
      • Change kernel
        • Python 3
    • Navigate
      • 1  Mimid : Inferring Grammars
        • 1.1  Verify System Version
        • 1.2  Install Prerequisites
          • 1.2.1  Recommended Extensions
            • 1.2.1.1  Table of contents
            • 1.2.1.2  Collapsible headings
            • 1.2.1.3  Execute time
            • 1.2.1.4  Code folding
          • 1.2.2  Cleanup
          • 1.2.3  Magic for cell contents
        • 1.3  Our subject programs
          • 1.3.1  Calculator.py
          • 1.3.2  Mathexpr.py
          • 1.3.3  Microjson.py
          • 1.3.4  URLParse.py
          • 1.3.5  Netrc.py
          • 1.3.6  CGIDecode.py
          • 1.3.7  Subject Registry
        • 1.4  Rewriting the source to track control flow and taints.
          • 1.4.1  InRewriter
            • 1.4.1.1  Using It
          • 1.4.2  Rewriter
            • 1.4.2.1  The method context wrapper
            • 1.4.2.2  The stack wrapper
            • 1.4.2.3  The scope wrapper
            • 1.4.2.4  Rewriting If conditions
            • 1.4.2.5  Rewriting while loops
            • 1.4.2.6  Combining both
            • 1.4.2.7  Generating the complete instrumented source
            • 1.4.2.8  Using It
          • 1.4.3  Generate Transformed Sources
          • 1.4.4  Context Mangers
            • 1.4.4.1  Method context
            • 1.4.4.2  Stack context
            • 1.4.4.3  Scope context
          • 1.4.5  Taint Tracker
        • 1.5  Generating Traces
        • 1.6  Mining the Traces Generated
          • 1.6.1  Reconstructing the Method Tree with Attached Character Comparisons
            • 1.6.1.1  Identifying last comparisons
            • 1.6.1.2  Attaching characters to the tree
            • 1.6.1.3  Removing Overlap
            • 1.6.1.4  Generate derivation tree
          • 1.6.2  The Complete Miner
        • 1.7  Generalize Iterations
          • 1.7.1  Checking compatibility of nodes
            • 1.7.1.1  Using it
          • 1.7.2  Propagate rename of the while node up the child nodes.
          • 1.7.3  Generalize a given set of loops
          • 1.7.4  Collect loops to generalize
        • 1.8  Generating a Grammar
          • 1.8.1  Trees to grammar
          • 1.8.2  Inserting Empty Alternatives for IF and Loops
          • 1.8.3  Learning Regular Expressions
            • 1.8.3.1  The modified Fernau algorithm
          • 1.8.4  Remove duplicate and redundant entries
        • 1.9  Accio Grammar
        • 1.10  Libraries
          • 1.10.1  StringIO replacement
          • 1.10.2  ShLex Replacement
      • 2  Evaluation
        • 2.1  Initialization
          • 2.1.1  Check Recall
          • 2.1.2  Check Precision
          • 2.1.3  Timer
        • 2.2  Subjects
          • 2.2.1  CGIDecode
            • 2.2.1.1  Golden Grammar
            • 2.2.1.2  Samples
            • 2.2.1.3  Mimid
            • 2.2.1.4  Autogram
          • 2.2.2  Calculator
            • 2.2.2.1  Golden Grammar
            • 2.2.2.2  Samples
            • 2.2.2.3  Mimid
            • 2.2.2.4  Autogram
          • 2.2.3  MathExpr
            • 2.2.3.1  Golden Grammar
            • 2.2.3.2  Samples
            • 2.2.3.3  Mimid
            • 2.2.3.4  Autogram
          • 2.2.4  URLParse
            • 2.2.4.1  Golden Grammar
            • 2.2.4.2  Samples
            • 2.2.4.3  Mimid
            • 2.2.4.4  Autogram
          • 2.2.5  Netrc
            • 2.2.5.1  Golden Grammar
            • 2.2.5.2  Samples
            • 2.2.5.3  Mimid
            • 2.2.5.4  Autogram
          • 2.2.6  Microjson
            • 2.2.6.1  Microjson Validation
            • 2.2.6.2  Samples
            • 2.2.6.3  Mimid
            • 2.2.6.4  Autogram
        • 2.3  Results
          • 2.3.1  Table II (Time in Seconds)
          • 2.3.2  Table III (Precision)
          • 2.3.3  Table IV (Recall)
        • 2.4  Using a Recognizer (not a Parser)
        • 2.5  Parsing with Parser Combinators
          • 2.5.1  Helper
          • 2.5.2  Subject - assignment
          • 2.5.3  Sample
          • 2.5.4  Recovering the parse tree
        • 2.6  Parsing with PEG Parser
          • 2.6.1  PEG samples
    • Widgets
      • Save Notebook Widget State
      • Clear Notebook Widget State
        • Download Widget State
        • Embed Widgets
      • Help
        • User Interface Tour
        • Keyboard Shortcuts
        • Edit Keyboard Shortcuts
        • Notebook Help
        • Markdown
        • Jupyter-contrib
          nbextensions
        • Python Reference
        • IPython Reference
        • NumPy Reference
        • SciPy Reference
        • Matplotlib Reference
        • SymPy Reference
        • pandas Reference
        • About
      Contents
      • 1  Mimid : Inferring Grammars
        • 1.1  Verify System Version
        • 1.2  Install Prerequisites
          • 1.2.1  Recommended Extensions
            • 1.2.1.1  Table of contents
            • 1.2.1.2  Collapsible headings
            • 1.2.1.3  Execute time
            • 1.2.1.4  Code folding
          • 1.2.2  Cleanup
          • 1.2.3  Magic for cell contents
        • 1.3  Our subject programs
          • 1.3.1  Calculator.py
          • 1.3.2  Mathexpr.py
          • 1.3.3  Microjson.py
          • 1.3.4  URLParse.py
          • 1.3.5  Netrc.py
          • 1.3.6  CGIDecode.py
          • 1.3.7  Subject Registry
        • 1.4  Rewriting the source to track control flow and taints.
          • 1.4.1  InRewriter
            • 1.4.1.1  Using It
          • 1.4.2  Rewriter
            • 1.4.2.1  The method context wrapper
            • 1.4.2.2  The stack wrapper
            • 1.4.2.3  The scope wrapper
            • 1.4.2.4  Rewriting If conditions
            • 1.4.2.5  Rewriting while loops
            • 1.4.2.6  Combining both
            • 1.4.2.7  Generating the complete instrumented source
            • 1.4.2.8  Using It
          • 1.4.3  Generate Transformed Sources
          • 1.4.4  Context Mangers
            • 1.4.4.1  Method context
            • 1.4.4.2  Stack context
            • 1.4.4.3  Scope context
          • 1.4.5  Taint Tracker
        • 1.5  Generating Traces
        • 1.6  Mining the Traces Generated
          • 1.6.1  Reconstructing the Method Tree with Attached Character Comparisons
            • 1.6.1.1  Identifying last comparisons
            • 1.6.1.2  Attaching characters to the tree
            • 1.6.1.3  Removing Overlap
            • 1.6.1.4  Generate derivation tree
          • 1.6.2  The Complete Miner
        • 1.7  Generalize Iterations
          • 1.7.1  Checking compatibility of nodes
            • 1.7.1.1  Using it
          • 1.7.2  Propagate rename of the while node up the child nodes.
          • 1.7.3  Generalize a given set of loops
          • 1.7.4  Collect loops to generalize
        • 1.8  Generating a Grammar
          • 1.8.1  Trees to grammar
          • 1.8.2  Inserting Empty Alternatives for IF and Loops
          • 1.8.3  Learning Regular Expressions
            • 1.8.3.1  The modified Fernau algorithm
          • 1.8.4  Remove duplicate and redundant entries
        • 1.9  Accio Grammar
        • 1.10  Libraries
          • 1.10.1  StringIO replacement
          • 1.10.2  ShLex Replacement
      • 2  Evaluation
        • 2.1  Initialization
          • 2.1.1  Check Recall
          • 2.1.2  Check Precision
          • 2.1.3  Timer
        • 2.2  Subjects
          • 2.2.1  CGIDecode
            • 2.2.1.1  Golden Grammar
            • 2.2.1.2  Samples
            • 2.2.1.3  Mimid
            • 2.2.1.4  Autogram
          • 2.2.2  Calculator
            • 2.2.2.1  Golden Grammar
            • 2.2.2.2  Samples
            • 2.2.2.3  Mimid
            • 2.2.2.4  Autogram
          • 2.2.3  MathExpr
            • 2.2.3.1  Golden Grammar
            • 2.2.3.2  Samples
            • 2.2.3.3  Mimid
            • 2.2.3.4  Autogram
          • 2.2.4  URLParse
            • 2.2.4.1  Golden Grammar
            • 2.2.4.2  Samples
            • 2.2.4.3  Mimid
            • 2.2.4.4  Autogram
          • 2.2.5  Netrc
            • 2.2.5.1  Golden Grammar
            • 2.2.5.2  Samples
            • 2.2.5.3  Mimid
            • 2.2.5.4  Autogram
          • 2.2.6  Microjson
            • 2.2.6.1  Microjson Validation
            • 2.2.6.2  Samples
            • 2.2.6.3  Mimid
            • 2.2.6.4  Autogram
        • 2.3  Results
          • 2.3.1  Table II (Time in Seconds)
          • 2.3.2  Table III (Precision)
          • 2.3.3  Table IV (Recall)
        • 2.4  Using a Recognizer (not a Parser)
        • 2.5  Parsing with Parser Combinators
          • 2.5.1  Helper
          • 2.5.2  Subject - assignment
          • 2.5.3  Sample
          • 2.5.4  Recovering the parse tree
        • 2.6  Parsing with PEG Parser
          • 2.6.1  PEG samples
      17
       
      1
      # Mimid :  Inferring Grammars
      
      2
      ​
      
      3
      * Code for subjects [here](#Our-subject-programs)
      
      4
      * Evaluation starts [here](#Evaluation)
      
      5
        * The evaluation on specific subjects starts [here](#Subjects)
      
      6
          * [CGIDecode](#CGIDecode)
      
      7
          * [Calculator](#Calculator)
      
      8
          * [MathExpr](#MathExpr)
      
      9
          * [URLParse](#URLParse)
      
      10
          * [Netrc](#Netrc)
      
      11
          * [Microjson](#Microjson)
      
      12
      * Results are [here](#Results)
      
      13
      * Recovering parse tree from a recognizer is [here](#Using-a-Recognizer-(not-a-Parser))
      
      14
      * Recovering parse tree from parser combinators is [here](#Parsing-with-Parser-Combinators)
      
      15
      * Recovering parse tree from PEG parer is [here](#Parsing-with-PEG-Parser)
      
      16
      ​
      
      17
      Please note that a complete run can take an hour and a half to complete.
      

      1  Mimid : Inferring Grammars¶

      • Code for subjects here
      • Evaluation starts here
        • The evaluation on specific subjects starts here
          • CGIDecode
          • Calculator
          • MathExpr
          • URLParse
          • Netrc
          • Microjson
      • Results are here
      • Recovering parse tree from a recognizer is here
      • Recovering parse tree from parser combinators is here
      • Recovering parse tree from PEG parer is here

      Please note that a complete run can take an hour and a half to complete.

      1
       
      1
      We start with a few Jupyter magics that let us specify examples inline, that can be turned off if needed for faster execution. Switch `TOP to False` if you do not want examples to complete.
      

      We start with a few Jupyter magics that let us specify examples inline, that can be turned off if needed for faster execution. Switch TOP to False if you do not want examples to complete.

      In [1]:
      xxxxxxxxxx
      
      1
       
      1
      TOP = __name__ == '__main__'
      
      executed in 5ms, finished 04:51:36 2019-08-15
      . . .
      1
       
      1
      The magics we use are `%%var` and `%top`. The `%%var` lets us specify large strings such as file contents directly without too many escapes. The `%top` helps with examples.
      

      The magics we use are %%var and %top. The %%var lets us specify large strings such as file contents directly without too many escapes. The %top helps with examples.

      In [2]:
      xxxxxxxxxx
      
      23
       
      1
      from IPython.core.magic import  (Magics, magics_class, cell_magic, line_magic, line_cell_magic)
      
      2
      class B(dict):
      
      3
          def __getattr__(self, name):
      
      4
              return self.__getitem__(name)
      
      5
      @magics_class
      
      6
      class MyMagics(Magics):
      
      7
          def __init__(self, shell=None,  **kwargs):
      
      8
              super().__init__(shell=shell, **kwargs)
      
      9
              self._vars = B()
      
      10
              shell.user_ns['VARS'] = self._vars
      
      11
      ​
      
      12
          @cell_magic
      
      13
          def var(self, line, cell):
      
      14
              self._vars[line.strip()] = cell.strip()
      
      15
       
      
      16
          @line_cell_magic
      
      17
          def top(self, line, cell=None):
      
      18
              if TOP:
      
      19
                  if cell is None:
      
      20
                      cell = line
      
      21
                  ip = get_ipython()
      
      22
                  res = ip.run_cell(cell)
      
      23
      get_ipython().register_magics(MyMagics)
      
      executed in 10ms, finished 04:51:36 2019-08-15
      . . .
      1
       
      1
      ## Verify System Version
      

      1.1  Verify System Version¶

      In [3]:
      xxxxxxxxxx
      
      1
       
      1
      import sys
      
      executed in 6ms, finished 04:51:36 2019-08-15
      . . .
      1
       
      1
      Parts of the program, especially the subprocess execution using `do()` requires the new flags in `3.7`. I am not sure if the taints will work on anything above.
      

      Parts of the program, especially the subprocess execution using do() requires the new flags in 3.7. I am not sure if the taints will work on anything above.

      In [4]:
      xxxxxxxxxx
      
      1
       
      1
      %top assert sys.version_info[0:2] == (3, 7)
      
      executed in 7ms, finished 04:51:36 2019-08-15
      . . .
      In [5]:
      xxxxxxxxxx
      
      1
       
      1
      from subprocess import run
      
      executed in 6ms, finished 04:51:36 2019-08-15
      . . .
      In [6]:
      xxxxxxxxxx
      
      1
       
      1
      import os
      
      executed in 11ms, finished 04:51:36 2019-08-15
      . . .
      1
       
      1
      We keep a log of all system commands executed for easier debugging at `./build/do.log`.
      

      We keep a log of all system commands executed for easier debugging at ./build/do.log.

      In [7]:
      xxxxxxxxxx
      
      8
       
      1
      def do(command, env=None, shell=False, log=False, **args):
      
      2
          result = run(command, universal_newlines=True, shell=shell,
      
      3
                        env=dict(os.environ, **({} if env is None else env)),
      
      4
                       capture_output=True, **args)
      
      5
          if log:
      
      6
              with open('build/do.log', 'a+') as f:
      
      7
                  print(json.dumps({'cmd':command, 'env':env, 'exitcode':result.returncode}), env, file=f)
      
      8
          return result
      
      executed in 8ms, finished 04:51:36 2019-08-15
      . . .
      In [8]:
      xxxxxxxxxx
      
      1
       
      1
      import random
      
      executed in 5ms, finished 04:51:36 2019-08-15
      . . .
      1
       
      1
      Try to ensure replicability of measurements.
      

      Try to ensure replicability of measurements.

      In [9]:
      xxxxxxxxxx
      
      1
       
      1
      random.seed(0)
      
      executed in 6ms, finished 04:51:36 2019-08-15
      . . .
      1
       
      1
      Note that this notebook was tested on `Debian GNU/Linux 8.10 and 9.9` and `MacOS Mojave 10.14.5`. In particular, I do not know if everything will work on `Windows`.
      

      Note that this notebook was tested on Debian GNU/Linux 8.10 and 9.9 and MacOS Mojave 10.14.5. In particular, I do not know if everything will work on Windows.

      In [10]:
      xxxxxxxxxx
      
      1
       
      1
      import shutil
      
      executed in 6ms, finished 04:51:36 2019-08-15
      . . .
      In [11]:
      xxxxxxxxxx
      
      8
       
      1
      %%top
      
      2
      if shutil.which('lsb_release'):
      
      3
          res = do(['lsb_release', '-d']).stdout
      
      4
      elif shutil.which('sw_vers'):
      
      5
          res = do(['sw_vers']).stdout
      
      6
      else:
      
      7
          assert False
      
      8
      res
      
      executed in 37ms, finished 04:51:36 2019-08-15
      Out[11]:
      'ProductName:\tMac OS X\nProductVersion:\t10.14.5\nBuildVersion:\t18F132\n'
      
      . . .
      In [12]:
      xxxxxxxxxx
      
      1
       
      1
      %top do(['jupyter', '--version']).stdout
      
      executed in 774ms, finished 04:51:37 2019-08-15
      Out[12]:
      'jupyter core     : 4.5.0\njupyter-notebook : 5.7.8\nqtconsole        : 4.5.1\nipython          : 7.6.1\nipykernel        : 5.1.1\njupyter client   : 5.3.1\njupyter lab      : not installed\nnbconvert        : 5.5.0\nipywidgets       : 7.5.0\nnbformat         : 4.4.0\ntraitlets        : 4.3.2\n'
      
      . . .
      1
       
      1
      ## Install Prerequisites
      

      1.2  Install Prerequisites¶

      3
       
      1
      Our code is based on the utilities provided by the [Fuzzingbook](http://fuzzingbook.org). Note that the measurements on time and precision in paper were based on Fuzzingbook `0.0.7`. During the development, we found a few bugs in Autogram, which we communicated back, which resulted in a new version of Fuzzingbook `0.8.0`.
      
      2
      ​
      
      3
      The fixed *Autogram* implementation of the *Fuzzingbook* has better precision rates for *Autogram*, and timing for grammar generation. However, these numbers still fall short of *Mimid* for most grammars. Further, the grammars generated by *Autogram* are still enumerative. That is, rather than producing a context free grammar, it simply appends input strings as alternates to the `<START>` nonterminal. This again results in bad recall numbers as before. Hence, it does not change our main points. During the remainder of this notebook, we use the `0.8.0` version of the Fuzzingbook.
      

      Our code is based on the utilities provided by the Fuzzingbook. Note that the measurements on time and precision in paper were based on Fuzzingbook 0.0.7. During the development, we found a few bugs in Autogram, which we communicated back, which resulted in a new version of Fuzzingbook 0.8.0.

      The fixed Autogram implementation of the Fuzzingbook has better precision rates for Autogram, and timing for grammar generation. However, these numbers still fall short of Mimid for most grammars. Further, the grammars generated by Autogram are still enumerative. That is, rather than producing a context free grammar, it simply appends input strings as alternates to the <START> nonterminal. This again results in bad recall numbers as before. Hence, it does not change our main points. During the remainder of this notebook, we use the 0.8.0 version of the Fuzzingbook.

      1
       
      1
      First we define `pip_install()`, a helper to silently install required dependencies.
      

      First we define pip_install(), a helper to silently install required dependencies.

      In [13]:
      xxxxxxxxxx
      
      2
       
      1
      def pip_install(v):
      
      2
          return do(['pip', 'install', '-qqq', *v.split(' ')]).returncode
      
      executed in 7ms, finished 04:51:37 2019-08-15
      . . .
      In [14]:
      xxxxxxxxxx
      
      1
       
      1
      %top pip_install('fuzzingbook==0.8.0')
      
      executed in 878ms, finished 04:51:38 2019-08-15
      Out[14]:
      0
      
      . . .
      1
       
      1
      Our external dependencies other than `fuzzingbook` are as follows.
      

      Our external dependencies other than fuzzingbook are as follows.

      In [15]:
      xxxxxxxxxx
      
      1
       
      1
      %top pip_install('astor graphviz scipy')
      
      executed in 758ms, finished 04:51:39 2019-08-15
      Out[15]:
      0
      
      . . .
      1
       
      1
      **IMPORTANT:** Restart the jupyter server after installation of dependencies and extensions.
      

      IMPORTANT: Restart the jupyter server after installation of dependencies and extensions.

      1
       
      1
      ### Recommended Extensions
      

      1.2.1  Recommended Extensions¶

      1
       
      1
      We recommend the following jupyter notebook extensions:
      

      We recommend the following jupyter notebook extensions:

      In [16]:
      xxxxxxxxxx
      
      1
       
      1
      %top pip_install('jupyter_contrib_nbextensions jupyter_nbextensions_configurator')
      
      executed in 871ms, finished 04:51:40 2019-08-15
      Out[16]:
      0
      
      . . .
      In [17]:
      xxxxxxxxxx
      
      1
       
      1
      %top do(['jupyter','contrib','nbextension','install','--user']).returncode
      
      executed in 1.24s, finished 04:51:41 2019-08-15
      Out[17]:
      0
      
      . . .
      In [18]:
      xxxxxxxxxx
      
      1
       
      1
      def nb_enable(v): return do(['jupyter','nbextension','enable',v]).returncode
      
      executed in 5ms, finished 04:51:41 2019-08-15
      . . .
      In [19]:
      xxxxxxxxxx
      
      1
       
      1
      %top do(['jupyter','nbextensions_configurator','enable','--user']).returncode
      
      executed in 363ms, finished 04:51:41 2019-08-15
      Out[19]:
      0
      
      . . .
      3
       
      1
      #### Table of contents
      
      2
      ​
      
      3
      Please install this extension. The navigation in the notebook is rather hard without this installed.
      

      1.2.1.1  Table of contents¶

      Please install this extension. The navigation in the notebook is rather hard without this installed.

      In [20]:
      xxxxxxxxxx
      
      1
       
      1
      %top nb_enable('toc2/main')
      
      executed in 226ms, finished 04:51:42 2019-08-15
      Out[20]:
      0
      
      . . .
      3
       
      1
      #### Collapsible headings
      
      2
      ​
      
      3
      Again, do install this extension. This will let you fold away those sections that you do not have an immediate interest in.
      

      1.2.1.2  Collapsible headings¶

      Again, do install this extension. This will let you fold away those sections that you do not have an immediate interest in.

      In [21]:
      xxxxxxxxxx
      
      1
       
      1
      %top nb_enable('collapsible_headings/main')
      
      executed in 256ms, finished 04:51:42 2019-08-15
      Out[21]:
      0
      
      . . .
      3
       
      1
      #### Execute time
      
      2
      ​
      
      3
      This is not strictly necessary, but can provide a better breakdown than `timeit` that we use for timing.
      

      1.2.1.3  Execute time¶

      This is not strictly necessary, but can provide a better breakdown than timeit that we use for timing.

      In [22]:
      xxxxxxxxxx
      
      1
       
      1
      %top nb_enable('execute_time/ExecuteTime')
      
      executed in 229ms, finished 04:51:42 2019-08-15
      Out[22]:
      0
      
      . . .
      2
       
      1
      #### Code folding
      
      2
      Very helpful for hiding away source contents of libraries that are not for grammar recovery.
      

      1.2.1.4  Code folding¶

      Very helpful for hiding away source contents of libraries that are not for grammar recovery.

      In [23]:
      xxxxxxxxxx
      
      1
       
      1
      %top nb_enable('codefolding/main')
      
      executed in 208ms, finished 04:51:42 2019-08-15
      Out[23]:
      0
      
      . . .
      1
       
      1
      ### Cleanup
      

      1.2.2  Cleanup¶

      1
       
      1
      To make runs faster, we cache quite a lot of things. Remove `build` if you change code or samples.
      

      To make runs faster, we cache quite a lot of things. Remove build if you change code or samples.

      In [24]:
      xxxxxxxxxx
      
      1
       
      1
      %top do(['rm', '-rf','build']).returncode
      
      executed in 24ms, finished 04:51:42 2019-08-15
      Out[24]:
      0
      
      . . .
      1
       
      1
      ### Magic for cell contents
      

      1.2.3  Magic for cell contents¶

      1
       
      1
      As we mentioned before `%%var` defines a multi line embedded string that is accessible from Python.
      

      As we mentioned before %%var defines a multi line embedded string that is accessible from Python.

      In [25]:
      xxxxxxxxxx
      
      4
       
      1
      %%var Mimid
      
      2
      # [(
      
      3
      Testing Mimid
      
      4
      # )]
      
      executed in 4ms, finished 04:51:42 2019-08-15
      . . .
      In [26]:
      xxxxxxxxxx
      
      1
       
      1
      %top VARS['Mimid']
      
      executed in 6ms, finished 04:51:42 2019-08-15
      Out[26]:
      '# [(\nTesting Mimid\n# )]'
      
      . . .
      1
       
      1
      ## Our subject programs
      

      1.3  Our subject programs¶

      1
       
      1
      Note that our taint tracking implementation is incomplete in that only some of the functions in Python are proxied to preserve taints. Hence, we modify source slightly where necessary to use the proxied functions without affecting the evaluation of the grammar inferencing algorithm.
      

      Note that our taint tracking implementation is incomplete in that only some of the functions in Python are proxied to preserve taints. Hence, we modify source slightly where necessary to use the proxied functions without affecting the evaluation of the grammar inferencing algorithm.

      3
       
      1
      ### Calculator.py
      
      2
      ​
      
      3
      This is a really simple calculator written in text book recursive descent style. Note that I have used `list()` in a few places to help out with taint tracking. This is due to the limitations of my taint tracking prototype. It can be fixed if required by simple AST walkers or better taint trackers.
      

      1.3.1  Calculator.py¶

      This is a really simple calculator written in text book recursive descent style. Note that I have used list() in a few places to help out with taint tracking. This is due to the limitations of my taint tracking prototype. It can be fixed if required by simple AST walkers or better taint trackers.

      In [27]:
      xxxxxxxxxx
      
      53
       
      1
      %%var calc_src↔​
      
      executed in 4ms, finished 04:51:42 2019-08-15
      . . .
      3
       
      1
      ### Mathexpr.py
      
      2
      ​
      
      3
      Originally from [here]( https://github.com/louisfisch/mathematical-expressions-parser). The mathexpr is much more complicated than our `calculator` and supports advanced functionalities such as predefined functions and variables.
      

      1.3.2  Mathexpr.py¶

      Originally from here. The mathexpr is much more complicated than our calculator and supports advanced functionalities such as predefined functions and variables.

      In [28]:
      xxxxxxxxxx
      
      251
       
      1
      %%var mathexpr_src↔​
      
      executed in 6ms, finished 04:51:42 2019-08-15
      . . .
      2
       
      1
      ### Microjson.py
      
      2
      The microjson is a complete pure python implementation of JSON parser, that was obtained from from [here](https://github.com/phensley/microjson). Note that we use `myio` which is an instrumented version of the original `io` to preserve taints.
      

      1.3.3  Microjson.py¶

      The microjson is a complete pure python implementation of JSON parser, that was obtained from from here. Note that we use myio which is an instrumented version of the original io to preserve taints.

      In [29]:
      xxxxxxxxxx
      
      408
       
      1
      %%var microjson_src↔​
      
      executed in 7ms, finished 04:51:42 2019-08-15
      . . .
      3
       
      1
      ### URLParse.py
      
      2
      ​
      
      3
      This is the URL parser that is part of the Python distribution. The source was obtained from [here](https://github.com/python/cpython/blob/3.6/Lib/urllib/parse.py).
      

      1.3.4  URLParse.py¶

      This is the URL parser that is part of the Python distribution. The source was obtained from here.

      In [30]:
      xxxxxxxxxx
      
      1066
       
      1
      %%var urlparse_src↔​
      
      executed in 12ms, finished 04:51:42 2019-08-15
      . . .
      2
       
      1
      ### Netrc.py
      
      2
      Netrc is the initialization file that is read by web-agents such as CURL. Python ships the netrc library in its standard distribution. This was taken from [here](https://github.com/python/cpython/blob/3.6/Lib/netrc.py). Note that we use `mylex` and `myio` which corresponds to `shlex` and `io` from Python distribution, but instrumented to preserve taints.
      

      1.3.5  Netrc.py¶

      Netrc is the initialization file that is read by web-agents such as CURL. Python ships the netrc library in its standard distribution. This was taken from here. Note that we use mylex and myio which corresponds to shlex and io from Python distribution, but instrumented to preserve taints.

      In [31]:
      xxxxxxxxxx
      
      141
       
      1
      %%var netrc_src↔​
      
      executed in 5ms, finished 04:51:42 2019-08-15
      . . .
      3
       
      1
      ### CGIDecode.py
      
      2
      ​
      
      3
      The CGIDecode is a program to decode a URL encoded string. The source for this program was obtained from [here](https://www.fuzzingbook.org/html/Coverage.html).
      

      1.3.6  CGIDecode.py¶

      The CGIDecode is a program to decode a URL encoded string. The source for this program was obtained from here.

      In [32]:
      xxxxxxxxxx
      
      37
       
      1
      %%var cgidecode_src↔​
      
      executed in 4ms, finished 04:51:42 2019-08-15
      . . .
      3
       
      1
      ### Subject Registry
      
      2
      ​
      
      3
      We store all our subject programs under `program_src`.
      

      1.3.7  Subject Registry¶

      We store all our subject programs under program_src.

      In [33]:
      xxxxxxxxxx
      
      10
       
      1
      # [(
      
      2
      program_src = {
      
      3
          'calculator.py': VARS['calc_src'],
      
      4
          'mathexpr.py': VARS['mathexpr_src'],
      
      5
          'urlparse.py': VARS['urlparse_src'],
      
      6
          'netrc.py': VARS['netrc_src'],
      
      7
          'cgidecode.py': VARS['cgidecode_src'],
      
      8
          'microjson.py': VARS['microjson_src']
      
      9
      }
      
      10
      # )]
      
      executed in 4ms, finished 04:51:42 2019-08-15
      . . .
      1
       
      1
      ## Rewriting the source to track control flow and taints.
      

      1.4  Rewriting the source to track control flow and taints.¶

      1
       
      1
      We rewrite the source so that `asring in value` gets converted to `taint_wrap__(astring).in_(value)`. Note that what we are tracking is not really taints, but rather _character accesses_ to the origin string.
      

      We rewrite the source so that asring in value gets converted to taint_wrap__(astring).in_(value). Note that what we are tracking is not really taints, but rather character accesses to the origin string.

      1
       
      1
      We also rewrite the methods so that method bodies are enclosed in a `method__` context manager, any `if`conditions and `while` loops (only `while` for now) are enclosed in an outer `stack__` and inner `scope__` context manager. This lets us track when the corresponding scopes are entered into and left.
      

      We also rewrite the methods so that method bodies are enclosed in a method__ context manager, any ifconditions and while loops (only while for now) are enclosed in an outer stack__ and inner scope__ context manager. This lets us track when the corresponding scopes are entered into and left.

      In [34]:
      xxxxxxxxxx
      
      2
       
      1
      import ast
      
      2
      import astor
      
      executed in 10ms, finished 04:51:42 2019-08-15
      . . .
      2
       
      1
      ### InRewriter
      
      2
      The `InRewriter` class handles transforming `in` statements so that taints can be tracked. It has two methods. The `wrap()` method transforms any `a in lst` calls to `taint_wrap__(a) in lst`.
      

      1.4.1  InRewriter¶

      The InRewriter class handles transforming in statements so that taints can be tracked. It has two methods. The wrap() method transforms any a in lst calls to taint_wrap__(a) in lst.

      In [35]:
      xxxxxxxxxx
      
      3
       
      1
      class InRewriter(ast.NodeTransformer):
      
      2
          def wrap(self, node):
      
      3
              return ast.Call(func=ast.Name(id='taint_wrap__', ctx=ast.Load()), args=[node], keywords=[])
      
      executed in 5ms, finished 04:51:42 2019-08-15
      . . .
      1
       
      1
      The `wrap()` method is internally used by `visit_Compare()` method to transform `a in lst` to `taint_wrap__(a).in_(lst)`. We need to do this because Python ties the overriding of `in` operator to the `__contains__()` method in the class of `lst`. In our case, however, very often `a` is the element tainted and hence proxied. Hence we need a method invoked on the `a` object.
      

      The wrap() method is internally used by visit_Compare() method to transform a in lst to taint_wrap__(a).in_(lst). We need to do this because Python ties the overriding of in operator to the __contains__() method in the class of lst. In our case, however, very often a is the element tainted and hence proxied. Hence we need a method invoked on the a object.

      In [36]:
      xxxxxxxxxx
      
      12
       
      1
      class InRewriter(InRewriter):
      
      2
          def visit_Compare(self, tree_node):
      
      3
              left = tree_node.left
      
      4
              if not tree_node.ops or not isinstance(tree_node.ops[0], ast.In):
      
      5
                  return tree_node
      
      6
              mod_val = ast.Call(
      
      7
                  func=ast.Attribute(
      
      8
                      value=self.wrap(left),
      
      9
                      attr='in_'),
      
      10
                  args=tree_node.comparators,
      
      11
                  keywords=[])
      
      12
              return mod_val
      
      executed in 7ms, finished 04:51:42 2019-08-15
      . . .
      1
       
      1
      Tying it together.
      

      Tying it together.

      In [37]:
      xxxxxxxxxx
      
      4
       
      1
      def rewrite_in(src):
      
      2
          v = ast.fix_missing_locations(InRewriter().visit(ast.parse(src)))
      
      3
          source = astor.to_source(v)
      
      4
          return "%s" % source
      
      executed in 3ms, finished 04:51:42 2019-08-15
      . . .
      1
       
      1
      ####  Using It
      

      1.4.1.1  Using It¶

      In [38]:
      xxxxxxxxxx
      
      1
       
      1
      from fuzzingbook.fuzzingbook_utils import print_content
      
      executed in 137ms, finished 04:51:43 2019-08-15
      . . .
      In [39]:
      xxxxxxxxxx
      
      1
       
      1
      %top print_content(rewrite_in('s in ["a", "b", "c"]'))
      
      executed in 267ms, finished 04:51:43 2019-08-15
      taint_wrap__(s).in_(['a', 'b', 'c'])
      
      . . .
      3
       
      1
      ### Rewriter
      
      2
      ​
      
      3
      The `Rewriter` class handles inserting tracing probes into methods and control structures. Essentially, we insert a `with` scope for the method body, and a `with` scope outside both `while` and `if` scopes. Finally, we insert a `with` scope inside the `while` and `if` scopes. IMPORTANT: We only implement the `while` context. Similar should be implemented for the `for` context.
      

      1.4.2  Rewriter¶

      The Rewriter class handles inserting tracing probes into methods and control structures. Essentially, we insert a with scope for the method body, and a with scope outside both while and if scopes. Finally, we insert a with scope inside the while and if scopes. IMPORTANT: We only implement the while context. Similar should be implemented for the for context.

      1
       
      1
      #### The method context wrapper
      

      1.4.2.1  The method context wrapper¶

      1
       
      1
      A few counters to provide unique identifiers for context managers. Essentially, we number each if and while that we see.
      

      A few counters to provide unique identifiers for context managers. Essentially, we number each if and while that we see.

      In [40]:
      xxxxxxxxxx
      
      4
       
      1
      class Rewriter(InRewriter):
      
      2
          def init_counters(self):
      
      3
              self.if_counter = 0
      
      4
              self.while_counter = 0
      
      executed in 7ms, finished 04:51:43 2019-08-15
      . . .
      1
       
      1
      The `methods[]` array is used to keep track of the current method stack during execution. Epsilon and NoEpsilon are simply constants that I use to indicate whether an IF or a Loop is nullable or not. If it is nullable, I mark it with Epsilon.
      

      The methods[] array is used to keep track of the current method stack during execution. Epsilon and NoEpsilon are simply constants that I use to indicate whether an IF or a Loop is nullable or not. If it is nullable, I mark it with Epsilon.

      In [41]:
      xxxxxxxxxx
      
      3
       
      1
      methods = []
      
      2
      Epsilon = '-'
      
      3
      NoEpsilon = '='
      
      executed in 6ms, finished 04:51:43 2019-08-15
      . . .
      1
       
      1
      The `wrap_in_method()` generates a wrapper for method definitions.
      

      The wrap_in_method() generates a wrapper for method definitions.

      In [42]:
      xxxxxxxxxx
      
      7
       
      1
      class Rewriter(Rewriter):
      
      2
          def wrap_in_method(self, body, args):
      
      3
              method_name_expr = ast.Str(methods[-1])
      
      4
              my_args = ast.List(args.args, ast.Load())
      
      5
              args = [method_name_expr, my_args]
      
      6
              scope_expr = ast.Call(func=ast.Name(id='method__', ctx=ast.Load()), args=args, keywords=[])
      
      7
              return [ast.With(items=[ast.withitem(scope_expr, ast.Name(id='_method__'))], body=body)]
      
      executed in 10ms, finished 04:51:43 2019-08-15
      . . .
      1
       
      1
      The method `visit_FunctionDef()` is the method rewriter that actually does the job.
      

      The method visit_FunctionDef() is the method rewriter that actually does the job.

      In [43]:
      xxxxxxxxxx
      
      7
       
      1
      class Rewriter(Rewriter):
      
      2
          def visit_FunctionDef(self, tree_node):
      
      3
              self.init_counters()
      
      4
              methods.append(tree_node.name)
      
      5
              self.generic_visit(tree_node)
      
      6
              tree_node.body = self.wrap_in_method(tree_node.body, tree_node.args)
      
      7
              return tree_node
      
      executed in 8ms, finished 04:51:43 2019-08-15
      . . .
      1
       
      1
      The `rewrite_def()` method wraps the function definitions in scopes.
      

      The rewrite_def() method wraps the function definitions in scopes.

      In [44]:
      xxxxxxxxxx
      
      3
       
      1
      def rewrite_def(src):
      
      2
          v = ast.fix_missing_locations(Rewriter().visit(ast.parse(src)))
      
      3
          return astor.to_source(v)
      
      executed in 7ms, finished 04:51:43 2019-08-15
      . . .
      1
       
      1
      We can use it as follows:
      

      We can use it as follows:

      In [45]:
      xxxxxxxxxx
      
      1
       
      1
      %top print_content(rewrite_def('\n'.join(program_src['calculator.py'].split('\n')[12:19])), 'calculator.py')
      
      executed in 89ms, finished 04:51:43 2019-08-15
      def parse_paren(s, i):
          with method__('parse_paren', [s, i]) as _method__:
              assert s[i] == '('
              i, v = parse_expr(s, i + 1)
              if s[i:] == '':
                  raise Exception(s, i)
              assert s[i] == ')'
      
      . . .
      3
       
      1
      #### The stack wrapper
      
      2
      ​
      
      3
      The method `wrap_in_outer()` adds a `with ..stack..()` context _outside_ the control structures. The stack is used to keep track of the current control structure stack for any character comparison made. Notice the `can_empty` parameter. This indicates that the particular structure is _nullable_. For `if` we can make the condition right away. For `while` we postpone the decision.
      

      1.4.2.2  The stack wrapper¶

      The method wrap_in_outer() adds a with ..stack..() context outside the control structures. The stack is used to keep track of the current control structure stack for any character comparison made. Notice the can_empty parameter. This indicates that the particular structure is nullable. For if we can make the condition right away. For while we postpone the decision.

      In [46]:
      xxxxxxxxxx
      
      12
       
      1
      class Rewriter(Rewriter):
      
      2
          def wrap_in_outer(self, name, can_empty, counter, node):
      
      3
              name_expr = ast.Str(name)
      
      4
              can_empty_expr = ast.Str(can_empty)
      
      5
              counter_expr = ast.Num(counter)
      
      6
              method_id = ast.Name(id='_method__')
      
      7
              args = [name_expr, counter_expr, method_id, can_empty_expr]
      
      8
              scope_expr = ast.Call(func=ast.Name(id='stack__', ctx=ast.Load()),
      
      9
                      args=args, keywords=[])
      
      10
              return ast.With(
      
      11
                  items=[ast.withitem(scope_expr, ast.Name(id='%s_%d_stack__' % (name, counter)))], 
      
      12
                  body=[node])
      
      executed in 12ms, finished 04:51:43 2019-08-15
      . . .
      2
       
      1
      #### The scope wrapper
      
      2
      The method `wrap_in_inner()` adds a `with ...scope..()` context immediately inside the control structure. For `while`, this means simply adding one `with ...scope..()` just before the first line. For `if`, this means adding one `with ...scope...()` each to each branch of the `if` condition.
      

      1.4.2.3  The scope wrapper¶

      The method wrap_in_inner() adds a with ...scope..() context immediately inside the control structure. For while, this means simply adding one with ...scope..() just before the first line. For if, this means adding one with ...scope...() each to each branch of the if condition.

      In [47]:
      xxxxxxxxxx
      
      11
       
      1
      class Rewriter(Rewriter):
      
      2
          def wrap_in_inner(self, name, counter, val, body):
      
      3
              val_expr = ast.Num(val)
      
      4
              stack_iter = ast.Name(id='%s_%d_stack__' % (name, counter))
      
      5
              method_id = ast.Name(id='_method__')
      
      6
              args = [val_expr, stack_iter, method_id]
      
      7
              scope_expr = ast.Call(func=ast.Name(id='scope__', ctx=ast.Load()),
      
      8
                      args=args, keywords=[])
      
      9
              return [ast.With(
      
      10
                  items=[ast.withitem(scope_expr, ast.Name(id='%s_%d_%d_scope__' % (name, counter, val)))], 
      
      11
                  body=body)]
      
      executed in 7ms, finished 04:51:43 2019-08-15
      . . .
      3
       
      1
      #### Rewriting `If` conditions
      
      2
      ​
      
      3
      While rewriting if conditions, we have to take care of the cascading if conditions (`elsif`), which is represented as nested if conditions in AST. They do not require separate `stack` context, but only separate `scope` contexts.
      

      1.4.2.4  Rewriting If conditions¶

      While rewriting if conditions, we have to take care of the cascading if conditions (elsif), which is represented as nested if conditions in AST. They do not require separate stack context, but only separate scope contexts.

      In [48]:
      xxxxxxxxxx
      
      17
       
      1
      class Rewriter(Rewriter):
      
      2
          def process_if(self, tree_node, counter, val=None):
      
      3
              if val is None: val = 0
      
      4
              else: val += 1
      
      5
              if_body = []
      
      6
              self.generic_visit(tree_node.test)
      
      7
              for node in tree_node.body: self.generic_visit(node)
      
      8
              tree_node.body = self.wrap_in_inner('if', counter, val, tree_node.body)
      
      9
      ​
      
      10
              # else part.
      
      11
              if len(tree_node.orelse) == 1 and isinstance(tree_node.orelse[0], ast.If):
      
      12
                  self.process_if(tree_node.orelse[0], counter, val)
      
      13
              else:
      
      14
                  if tree_node.orelse:
      
      15
                      val += 1
      
      16
                      for node in tree_node.orelse: self.generic_visit(node)
      
      17
                      tree_node.orelse = self.wrap_in_inner('if', counter, val, tree_node.orelse)
      
      executed in 12ms, finished 04:51:43 2019-08-15
      . . .
      1
       
      1
      While rewriting `if` conditions, we have to take care of the cascading `if` conditions, which is represented as nested `if` conditions in AST. We need to identify whether the cascading `if` conditions (`elsif`) have an empty `orelse` clause or not. If it has an empty `orelse`, then the entire set of `if` conditions may be excised, and still produce a valid value. Hence, it should be marked as optional. The `visit_If()` checks if the cascading `ifs` have an `orelse` or not.  
      

      While rewriting if conditions, we have to take care of the cascading if conditions, which is represented as nested if conditions in AST. We need to identify whether the cascading if conditions (elsif) have an empty orelse clause or not. If it has an empty orelse, then the entire set of if conditions may be excised, and still produce a valid value. Hence, it should be marked as optional. The visit_If() checks if the cascading ifs have an orelse or not.

      In [49]:
      xxxxxxxxxx
      
      19
       
      1
      class Rewriter(Rewriter):
      
      2
          def visit_If(self, tree_node):
      
      3
              self.if_counter += 1
      
      4
              counter = self.if_counter
      
      5
              #is it empty
      
      6
              start = tree_node
      
      7
              while start:
      
      8
                  if isinstance(start, ast.If):
      
      9
                      if not start.orelse:
      
      10
                          start = None
      
      11
                      elif len(start.orelse) == 1:
      
      12
                          start = start.orelse[0]
      
      13
                      else:
      
      14
                          break
      
      15
                  else:
      
      16
                      break
      
      17
              self.process_if(tree_node, counter=self.if_counter)
      
      18
              can_empty = NoEpsilon if start else Epsilon  # NoEpsilon for + and Epsilon for *
      
      19
              return self.wrap_in_outer('if', can_empty, counter, tree_node)
      
      executed in 10ms, finished 04:51:43 2019-08-15
      . . .
      3
       
      1
      #### Rewriting `while` loops
      
      2
      ​
      
      3
      Rewriting while loops are simple. We wrap them in `stack` and `scope` contexts. We do not implement the `orelse` feature yet.
      

      1.4.2.5  Rewriting while loops¶

      Rewriting while loops are simple. We wrap them in stack and scope contexts. We do not implement the orelse feature yet.

      In [50]:
      xxxxxxxxxx
      
      10
       
      1
      class Rewriter(Rewriter):
      
      2
          def visit_While(self, tree_node):
      
      3
              self.generic_visit(tree_node)
      
      4
              self.while_counter += 1
      
      5
              counter = self.while_counter
      
      6
              test = tree_node.test
      
      7
              body = tree_node.body
      
      8
              assert not tree_node.orelse
      
      9
              tree_node.body = self.wrap_in_inner('while', counter, 0, body)
      
      10
              return self.wrap_in_outer('while', '?', counter, tree_node)
      
      executed in 7ms, finished 04:51:43 2019-08-15
      . . .
      1
       
      1
      #### Combining both
      

      1.4.2.6  Combining both¶

      In [51]:
      xxxxxxxxxx
      
      3
       
      1
      def rewrite_cf(src, original):
      
      2
          v = ast.fix_missing_locations(Rewriter().visit(ast.parse(src)))
      
      3
          return astor.to_source(v)
      
      executed in 7ms, finished 04:51:43 2019-08-15
      . . .
      1
       
      1
      An example with `if` conditions.
      

      An example with if conditions.

      In [52]:
      xxxxxxxxxx
      
      1
       
      1
      %top print_content('\n'.join(program_src['calculator.py'].split('\n')[12:19]), 'calculator.py')
      
      executed in 15ms, finished 04:51:43 2019-08-15
      def parse_paren(s, i):
          assert s[i] == '('
          i, v = parse_expr(s, i+1)
          if s[i:] == '':
              raise Exception(s, i)
          assert s[i] == ')'
      
      . . .
      In [53]:
      xxxxxxxxxx
      
      1
       
      1
      %top print_content(rewrite_cf('\n'.join(program_src['calculator.py'].split('\n')[12:19]), 'calculator.py').strip(), filename='calculator.py')
      
      executed in 19ms, finished 04:51:43 2019-08-15
      def parse_paren(s, i):
          with method__('parse_paren', [s, i]) as _method__:
              assert s[i] == '('
              i, v = parse_expr(s, i + 1)
              with stack__('if', 1, _method__, '-') as if_1_stack__:
                  if s[i:] == '':
                      with scope__(0, if_1_stack__, _method__) as if_1_0_scope__:
                          raise Exception(s, i)
              assert s[i] == ')'
      
      . . .
      1
       
      1
      An example with `while` loops.
      

      An example with while loops.

      In [54]:
      xxxxxxxxxx
      
      1
       
      1
      %top print_content('\n'.join(program_src['calculator.py'].split('\n')[5:11]), 'calculator.py')
      
      executed in 13ms, finished 04:51:43 2019-08-15
          
      def parse_num(s,i):
          n = ''
          while s[i:] and is_digit(s[i]):
              n += s[i]
              i = i +1
      
      . . .
      In [55]:
      xxxxxxxxxx
      
      1
       
      1
      %top print_content(rewrite_cf('\n'.join(program_src['calculator.py'].split('\n')[5:11]), 'calculator.py'), filename='calculator.py')
      
      executed in 14ms, finished 04:51:43 2019-08-15
      def parse_num(s, i):
          with method__('parse_num', [s, i]) as _method__:
              n = ''
              with stack__('while', 1, _method__, '?') as while_1_stack__:
                  while s[i:] and is_digit(s[i]):
                      with scope__(0, while_1_stack__, _method__
                          ) as while_1_0_scope__:
                          n += s[i]
                          i = i + 1
      
      . . .
      3
       
      1
      #### Generating the complete instrumented source
      
      2
      ​
      
      3
      For the complete instrumented source, we need to first make sure that all necessary imports are satisfied. Next, we also need to invoke the parser with the necessary tainted input and output the trace.
      

      1.4.2.7  Generating the complete instrumented source¶

      For the complete instrumented source, we need to first make sure that all necessary imports are satisfied. Next, we also need to invoke the parser with the necessary tainted input and output the trace.

      In [56]:
      xxxxxxxxxx
      
      34
       
      1
      def rewrite(src, original):
      
      2
          src = ast.fix_missing_locations(InRewriter().visit(ast.parse(src)))
      
      3
          v = ast.fix_missing_locations(Rewriter().visit(ast.parse(src)))
      
      4
          header = """
      
      5
      from mimid_context import scope__, stack__, method__
      
      6
      import json
      
      7
      import sys
      
      8
      import taints
      
      9
      from taints import taint_wrap__
      
      10
          """
      
      11
          source = astor.to_source(v)
      
      12
          footer = """
      
      13
      if __name__ == "__main__":
      
      14
          js = []
      
      15
          for arg in sys.argv[1:]:
      
      16
              with open(arg) as f:
      
      17
                  mystring = f.read().strip().replace('\\n', ' ')
      
      18
              taints.trace_init()
      
      19
              tainted_input = taints.wrap_input(mystring)
      
      20
              main(tainted_input)
      
      21
              assert tainted_input.comparisons
      
      22
              j = {
      
      23
              'comparisons_fmt': 'idx, char, method_call_id',
      
      24
              'comparisons':taints.convert_comparisons(tainted_input.comparisons, mystring),
      
      25
              'method_map_fmt': 'method_call_id, method_name, children',
      
      26
              'method_map': taints.convert_method_map(taints.METHOD_MAP),
      
      27
              'inputstr': mystring,
      
      28
              'original': %s,
      
      29
              'arg': arg}
      
      30
              js.append(j)
      
      31
          print(json.dumps(js))
      
      32
      """
      
      33
          footer = footer % repr(original)
      
      34
          return "%s\n%s\n%s" % (header, source, footer)
      
      executed in 9ms, finished 04:51:43 2019-08-15
      . . .
      1
       
      1
      #### Using It
      

      1.4.2.8  Using It¶

      In [57]:
      xxxxxxxxxx
      
      1
       
      1
      %top calc_parse_rewritten = rewrite(program_src['calculator.py'], original='calculator.py')
      
      executed in 11ms, finished 04:51:43 2019-08-15
      . . .
      In [58]:
      xxxxxxxxxx
      
      1
       
      1
      %top print_content(calc_parse_rewritten, filename='calculator.py')
      
      executed in 23ms, finished 04:51:43 2019-08-15
      from mimid_context import scope__, stack__, method__
      import json
      import sys
      import taints
      from taints import taint_wrap__
          
      import string
      
      
      def is_digit(i):
          with method__('is_digit', [i]) as _method__:
              return taint_wrap__(i).in_(list(string.digits))
      
      
      def parse_num(s, i):
          with method__('parse_num', [s, i]) as _method__:
              n = ''
              with stack__('while', 1, _method__, '?') as while_1_stack__:
                  while s[i:] and is_digit(s[i]):
                      with scope__(0, while_1_stack__, _method__
                          ) as while_1_0_scope__:
                          n += s[i]
                          i = i + 1
              return i, n
      
      
      def parse_paren(s, i):
          with method__('parse_paren', [s, i]) as _method__:
              assert s[i] == '('
              i, v = parse_expr(s, i + 1)
              with stack__('if', 1, _method__, '-') as if_1_stack__:
                  if s[i:] == '':
                      with scope__(0, if_1_stack__, _method__) as if_1_0_scope__:
                          raise Exception(s, i)
              assert s[i] == ')'
              return i + 1, v
      
      
      def parse_expr(s, i=0):
          with method__('parse_expr', [s, i]) as _method__:
              expr = []
              is_op = True
              with stack__('while', 1, _method__, '?') as while_1_stack__:
                  while s[i:]:
                      with scope__(0, while_1_stack__, _method__
                          ) as while_1_0_scope__:
                          c = s[i]
                          with stack__('if', 1, _method__, '=') as if_1_stack__:
                              if taint_wrap__(c).in_(list(string.digits)):
                                  with scope__(0, if_1_stack__, _method__
                                      ) as if_1_0_scope__:
                                      if not is_op:
                                          raise Exception(s, i)
                                      i, num = parse_num(s, i)
                                      expr.append(num)
                                      is_op = False
                              elif taint_wrap__(c).in_(['+', '-', '*', '/']):
                                  with scope__(1, if_1_stack__, _method__
                                      ) as if_1_1_scope__:
                                      if is_op:
                                          raise Exception(s, i)
                                      expr.append(c)
                                      is_op = True
                                      i = i + 1
                              elif c == '(':
                                  with scope__(2, if_1_stack__, _method__
                                      ) as if_1_2_scope__:
                                      if not is_op:
                                          raise Exception(s, i)
                                      i, cexpr = parse_paren(s, i)
                                      expr.append(cexpr)
                                      is_op = False
                              elif c == ')':
                                  with scope__(3, if_1_stack__, _method__
                                      ) as if_1_3_scope__:
                                      break
                              else:
                                  with scope__(4, if_1_stack__, _method__
                                      ) as if_1_4_scope__:
                                      raise Exception(s, i)
              with stack__('if', 2, _method__, '-') as if_2_stack__:
                  if is_op:
                      with scope__(0, if_2_stack__, _method__) as if_2_0_scope__:
                          raise Exception(s, i)
              return i, expr
      
      
      def main(arg):
          with method__('main', [arg]) as _method__:
              return parse_expr(arg)
      
      
      if __name__ == "__main__":
          js = []
          for arg in sys.argv[1:]:
              with open(arg) as f:
                  mystring = f.read().strip().replace('\n', ' ')
              taints.trace_init()
              tainted_input = taints.wrap_input(mystring)
              main(tainted_input)
              assert tainted_input.comparisons
              j = {
              'comparisons_fmt': 'idx, char, method_call_id',
              'comparisons':taints.convert_comparisons(tainted_input.comparisons, mystring),
              'method_map_fmt': 'method_call_id, method_name, children',
              'method_map': taints.convert_method_map(taints.METHOD_MAP),
              'inputstr': mystring,
              'original': 'calculator.py',
              'arg': arg}
              js.append(j)
          print(json.dumps(js))
      
      . . .
      1
       
      1
      ### Generate Transformed Sources
      

      1.4.3  Generate Transformed Sources¶

      1
       
      1
      We will now write the transformed sources.
      

      We will now write the transformed sources.

      In [59]:
      xxxxxxxxxx
      
      1
       
      1
      do(['mkdir','-p','build','subjects','samples']).returncode
      
      executed in 14ms, finished 04:51:43 2019-08-15
      Out[59]:
      0
      
      . . .
      In [60]:
      xxxxxxxxxx
      
      8
       
      1
      # [(
      
      2
      for file_name in program_src:
      
      3
          print(file_name)
      
      4
          with open("subjects/%s" % file_name, 'wb+') as f:
      
      5
              f.write(program_src[file_name].encode('utf-8'))
      
      6
          with open("build/%s" % file_name, 'w+') as f:
      
      7
              f.write(rewrite(program_src[file_name], file_name))
      
      8
      # )]
      
      executed in 208ms, finished 04:51:43 2019-08-15
      calculator.py
      mathexpr.py
      urlparse.py
      netrc.py
      cgidecode.py
      microjson.py
      
      . . .
      3
       
      1
      ### Context Mangers
      
      2
      ​
      
      3
      The context managers are probes inserted into the source code so that we know when execution enters and exits specific control flow structures such as conditionals and loops. Note that source code for these probes are not really a requirement. They can be inserted directly on binaries too, or even dynamically inserted using tools such as `dtrace`. For now, we make our life simple using AST editing.
      

      1.4.4  Context Mangers¶

      The context managers are probes inserted into the source code so that we know when execution enters and exits specific control flow structures such as conditionals and loops. Note that source code for these probes are not really a requirement. They can be inserted directly on binaries too, or even dynamically inserted using tools such as dtrace. For now, we make our life simple using AST editing.

      2
       
      1
      #### Method context
      
      2
      The `method__` context handles the assignment of method name, as well as storing the method stack.
      

      1.4.4.1  Method context¶

      The method__ context handles the assignment of method name, as well as storing the method stack.

      In [338]:
      xxxxxxxxxx
      
      30
       
      1
      %%var mimid_method_context
      
      2
      # [(
      
      3
      import taints
      
      4
      import urllib.parse
      
      5
      def to_key(method, name, num):
      
      6
          return '%s:%s_%s' % (method, name, num)
      
      7
      ​
      
      8
      class method__:
      
      9
          def __init__(self, name, args):
      
      10
              if not taints.METHOD_NUM_STACK: return
      
      11
              self.args = '_'.join([urllib.parse.quote(i) for i in args if type(i) == str])
      
      12
              if not self.args:
      
      13
                  self.name = name
      
      14
              else:
      
      15
                  self.name = "%s__%s" % (name, self.args) # <- not for now #TODO
      
      16
              if args and hasattr(args[0], 'tag'):
      
      17
                  self.name = "%s:%s" % (args[0].tag, self.name)
      
      18
              taints.trace_call(self.name)
      
      19
      ​
      
      20
          def __enter__(self):
      
      21
              if not taints.METHOD_NUM_STACK: return
      
      22
              taints.trace_set_method(self.name)
      
      23
              self.stack = []
      
      24
              return self
      
      25
      ​
      
      26
          def __exit__(self, *args):
      
      27
              if not taints.METHOD_NUM_STACK: return
      
      28
              taints.trace_return()
      
      29
              taints.trace_set_method(self.name)
      
      30
      # )]
      
      executed in 5ms, finished 06:25:49 2019-08-15
      . . .
      1
       
      1
      #### Stack context
      

      1.4.4.2  Stack context¶

      1
       
      1
      The stack context stores the current prefix and handles updating the stack that is stored at the method context. 
      

      The stack context stores the current prefix and handles updating the stack that is stored at the method context.

      In [348]:
      xxxxxxxxxx
      
      24
       
      1
      %%var mimid_stack_context
      
      2
      # [(
      
      3
      class stack__:
      
      4
          def __init__(self, name, num, method_i, can_empty):
      
      5
              if not taints.METHOD_NUM_STACK: return
      
      6
              self.method_stack = method_i.stack
      
      7
              self.can_empty = can_empty # * means yes. + means no, ? means to be determined
      
      8
              self.name, self.num, self.method = name, num, method_i.name
      
      9
              self.prefix = to_key(self.method, self.name, self.num)
      
      10
      ​
      
      11
          def __enter__(self):
      
      12
              if not taints.METHOD_NUM_STACK: return
      
      13
              if self.name in {'while'}:
      
      14
                  self.method_stack.append(0)
      
      15
              elif self.name in {'if'}:
      
      16
                  self.method_stack.append(-1)
      
      17
              else:
      
      18
                  assert False
      
      19
              return self
      
      20
      ​
      
      21
          def __exit__(self, *args):
      
      22
              if not taints.METHOD_NUM_STACK: return
      
      23
              self.method_stack.pop()
      
      24
      # )]
      
      executed in 6ms, finished 06:26:43 2019-08-15
      . . .
      2
       
      1
      #### Scope context
      
      2
      The scope context correctly identifies when the control structure is entered into, and exited (in case of loops) and the alternative entered int (in case of if conditions).
      

      1.4.4.3  Scope context¶

      The scope context correctly identifies when the control structure is entered into, and exited (in case of loops) and the alternative entered int (in case of if conditions).

      In [349]:
      xxxxxxxxxx
      
      31
       
      1
      %%var mimid_scope_context
      
      2
      # [(
      
      3
      import json
      
      4
      class scope__:
      
      5
          def __init__(self, alt, stack_i, method_i):
      
      6
              if not taints.METHOD_NUM_STACK: return
      
      7
              self.name, self.num, self.method, self.alt = stack_i.name, stack_i.num, stack_i.method, alt
      
      8
              self.method_stack = method_i.stack
      
      9
              self.can_empty = stack_i.can_empty
      
      10
      ​
      
      11
          def __enter__(self):
      
      12
              if not taints.METHOD_NUM_STACK: return
      
      13
              if self.name in {'while'}:
      
      14
                  self.method_stack[-1] += 1
      
      15
              elif self.name in {'if'}:
      
      16
                  pass
      
      17
              else:
      
      18
                  assert False, self.name
      
      19
              uid = json.dumps(self.method_stack)
      
      20
              if self.name in {'while'}:
      
      21
                  taints.trace_call('%s:%s_%s %s %s' % (self.method, self.name, self.num, self.can_empty, uid))
      
      22
              else:
      
      23
                  taints.trace_call('%s:%s_%s %s %s#%s' % (self.method, self.name, self.num, self.can_empty, self.alt, uid))
      
      24
              taints.trace_set_method(self.name)
      
      25
              return self
      
      26
      ​
      
      27
          def __exit__(self, *args):
      
      28
              if not taints.METHOD_NUM_STACK: return
      
      29
              taints.trace_return()
      
      30
              taints.trace_set_method(self.name)
      
      31
      # )]
      
      executed in 3ms, finished 06:26:46 2019-08-15
      . . .
      3
       
      1
      ### Taint Tracker
      
      2
      ​
      
      3
      The taint tracker is essentially a reimplementation of the information flow taints from the Fuzzingbook. It incorporates tracing of character accesses. IMPORTANT: Not all methods are implemented.
      

      1.4.5  Taint Tracker¶

      The taint tracker is essentially a reimplementation of the information flow taints from the Fuzzingbook. It incorporates tracing of character accesses. IMPORTANT: Not all methods are implemented.

      In [350]:
      xxxxxxxxxx
      
      655
       
      1
      %%var taints_src↔​
      
      executed in 8ms, finished 06:26:49 2019-08-15
      . . .
      1
       
      1
      We write both files to the appropriate locations.
      

      We write both files to the appropriate locations.

      In [351]:
      xxxxxxxxxx
      
      9
       
      1
      # [(
      
      2
      with open('build/mimid_context.py', 'w+') as f:
      
      3
          print(VARS['mimid_method_context'], file=f)
      
      4
          print(VARS['mimid_stack_context'], file=f)
      
      5
          print(VARS['mimid_scope_context'], file=f)
      
      6
      ​
      
      7
      with open('build/taints.py', 'w+') as f:
      
      8
          print(VARS['taints_src'], file=f)
      
      9
      # )]
      
      executed in 5ms, finished 06:26:50 2019-08-15
      . . .
      1
       
      1
      ## Generating Traces
      

      1.5  Generating Traces¶

      1
       
      1
      Here is how one can generate traces for the `calc` program.
      

      Here is how one can generate traces for the calc program.

      In [66]:
      xxxxxxxxxx
      
      1
       
      1
      %top do(['mkdir','-p','samples/calc']).returncode
      
      executed in 17ms, finished 04:51:44 2019-08-15
      Out[66]:
      0
      
      . . .
      In [67]:
      xxxxxxxxxx
      
      1
       
      1
      %top do(['mkdir','-p','samples/mathexpr']).returncode
      
      executed in 17ms, finished 04:51:44 2019-08-15
      Out[67]:
      0
      
      . . .
      In [68]:
      xxxxxxxxxx
      
      11
       
      1
      %%top
      
      2
      # [(
      
      3
      with open('samples/calc/0.csv', 'w+') as f:
      
      4
          print('9-(16+72)*3/458', file=f)
      
      5
          
      
      6
      with open('samples/calc/1.csv', 'w+') as f:
      
      7
          print('(9)+3/4/58', file=f)
      
      8
          
      
      9
      with open('samples/calc/2.csv', 'w+') as f:
      
      10
          print('8*3/40', file=f)
      
      11
      # )]
      
      executed in 9ms, finished 04:51:44 2019-08-15
      . . .
      1
       
      1
      Generating traces on `mathexpr`.
      

      Generating traces on mathexpr.

      In [69]:
      xxxxxxxxxx
      
      11
       
      1
      %%top
      
      2
      # [(
      
      3
      with open('samples/mathexpr/0.csv', 'w+') as f:
      
      4
          print('100', file=f)
      
      5
          
      
      6
      with open('samples/mathexpr/1.csv', 'w+') as f:
      
      7
          print('2 + 3', file=f)
      
      8
          
      
      9
      with open('samples/mathexpr/2.csv', 'w+') as f:
      
      10
          print('4 * 5', file=f)
      
      11
      # )]
      
      executed in 9ms, finished 04:51:44 2019-08-15
      . . .
      In [70]:
      xxxxxxxxxx
      
      1
       
      1
      %top calc_trace_out = do("python build/calculator.py samples/calc/*.csv", shell=True).stdout
      
      executed in 92ms, finished 04:51:44 2019-08-15
      . . .
      In [71]:
      xxxxxxxxxx
      
      1
       
      1
      %top mathexpr_trace_out = do("python build/mathexpr.py samples/mathexpr/*.csv", shell=True).stdout
      
      executed in 70ms, finished 04:51:44 2019-08-15
      . . .
      In [72]:
      xxxxxxxxxx
      
      1
       
      1
      import json
      
      executed in 6ms, finished 04:51:44 2019-08-15
      . . .
      In [73]:
      xxxxxxxxxx
      
      1
       
      1
      %top calc_trace = json.loads(calc_trace_out)
      
      executed in 8ms, finished 04:51:44 2019-08-15
      . . .
      In [74]:
      xxxxxxxxxx
      
      1
       
      1
      %top mathexpr_trace = json.loads(mathexpr_trace_out)
      
      executed in 6ms, finished 04:51:44 2019-08-15
      . . .
      1
       
      1
      ## Mining the Traces Generated
      

      1.6  Mining the Traces Generated¶

      7
       
      1
      ### Reconstructing the Method Tree with Attached Character Comparisons
      
      2
      ​
      
      3
      Reconstruct the actual method trace from a trace with the following
      
      4
      format
      
      5
      ```
      
      6
      key   : [ mid, method_name, children_ids ]
      
      7
      ```
      

      1.6.1  Reconstructing the Method Tree with Attached Character Comparisons¶

      Reconstruct the actual method trace from a trace with the following format

      key   : [ mid, method_name, children_ids ]
      
      In [75]:
      xxxxxxxxxx
      
      24
       
      1
      def reconstruct_method_tree(method_map):
      
      2
          first_id = None
      
      3
          tree_map = {}
      
      4
          for key in method_map:
      
      5
              m_id, m_name, m_children = method_map[key]
      
      6
              children = []
      
      7
              if m_id in tree_map:
      
      8
                  # just update the name and children
      
      9
                  assert not tree_map[m_id]
      
      10
                  tree_map[m_id]['id'] = m_id
      
      11
                  tree_map[m_id]['name'] = m_name
      
      12
                  tree_map[m_id]['indexes'] = []
      
      13
                  tree_map[m_id]['children'] = children
      
      14
              else:
      
      15
                  assert first_id is None
      
      16
                  tree_map[m_id] = {'id': m_id, 'name': m_name, 'children': children, 'indexes': []}
      
      17
                  first_id = m_id
      
      18
      ​
      
      19
              for c in m_children:
      
      20
                  assert c not in tree_map
      
      21
                  val = {}
      
      22
                  tree_map[c] = val
      
      23
                  children.append(val)
      
      24
          return first_id, tree_map
      
      executed in 8ms, finished 04:51:44 2019-08-15
      . . .
      1
       
      1
      Here is how one would use it. The first element in the returned tuple is the id of the bottom most method call.
      

      Here is how one would use it. The first element in the returned tuple is the id of the bottom most method call.

      In [76]:
      xxxxxxxxxx
      
      1
       
      1
      from fuzzingbook.GrammarFuzzer import display_tree
      
      executed in 17ms, finished 04:51:44 2019-08-15
      . . .
      In [77]:
      xxxxxxxxxx
      
      1
       
      1
      %top first, calc_method_tree1 = reconstruct_method_tree(calc_trace[0]['method_map'])
      
      executed in 7ms, finished 04:51:44 2019-08-15
      . . .
      In [78]:
      xxxxxxxxxx
      
      1
       
      1
      %top first, mathexpr_method_tree1 = reconstruct_method_tree(mathexpr_trace[0]['method_map'])
      
      executed in 6ms, finished 04:51:44 2019-08-15
      . . .
      In [79]:
      xxxxxxxxxx
      
      5
       
      1
      def extract_node(node, id):
      
      2
          symbol = str(node['id'])
      
      3
          children = node['children']
      
      4
          annotation = str(node['name'])
      
      5
          return "%s:%s" % (symbol, annotation), children, ''
      
      executed in 7ms, finished 04:51:44 2019-08-15
      . . .
      In [80]:
      xxxxxxxxxx
      
      1
       
      1
      %top v = display_tree(calc_method_tree1[0], extract_node=extract_node)
      
      executed in 12ms, finished 04:51:44 2019-08-15
      . . .
      In [81]:
      xxxxxxxxxx
      
      1
       
      1
      from IPython.display import Image
      
      executed in 5ms, finished 04:51:44 2019-08-15
      . . .
      In [82]:
      xxxxxxxxxx
      
      5
       
      1
      def zoom(v, zoom=True):
      
      2
          # return v directly if you do not want to zoom out.
      
      3
          if zoom:
      
      4
              return Image(v.render(format='png'))
      
      5
          return v
      
      executed in 4ms, finished 04:51:44 2019-08-15
      . . .
      In [83]:
      xxxxxxxxxx
      
      1
       
      1
      %top zoom(v)
      
      executed in 126ms, finished 04:51:44 2019-08-15
      Out[83]:
      . . .
      In [84]:
      xxxxxxxxxx
      
      1
       
      1
      %top zoom(display_tree(mathexpr_method_tree1[0], extract_node=extract_node))
      
      executed in 119ms, finished 04:51:44 2019-08-15
      Out[84]:
      . . .
      2
       
      1
      #### Identifying last comparisons
      
      2
      We need only the last comparisons made on any index. This means that we should care for only the last parse in an ambiguous parse. However, to make concessions for real world, we also check if we are overwriting a child (`HEURISTIC`). Note that `URLParser` is the only parser that needs this heuristic.
      

      1.6.1.1  Identifying last comparisons¶

      We need only the last comparisons made on any index. This means that we should care for only the last parse in an ambiguous parse. However, to make concessions for real world, we also check if we are overwriting a child (HEURISTIC). Note that URLParser is the only parser that needs this heuristic.

      In [85]:
      xxxxxxxxxx
      
      25
       
      1
      def last_comparisons(comparisons):
      
      2
          HEURISTIC = True
      
      3
          last_cmp_only = {}
      
      4
          last_idx = {}
      
      5
      ​
      
      6
          # get the last indexes compared in methods.
      
      7
          for idx, char, mid in comparisons:
      
      8
              if mid in last_idx:
      
      9
                  if idx > last_idx[mid]:
      
      10
                      last_idx[mid] = idx
      
      11
              else:
      
      12
                  last_idx[mid] = idx
      
      13
      ​
      
      14
          for idx, char, mid in comparisons:
      
      15
              if HEURISTIC:
      
      16
                  if idx in last_cmp_only:
      
      17
                      if last_cmp_only[idx] > mid:
      
      18
                          # do not clobber children unless it was the last character
      
      19
                          # for that child.
      
      20
                          if last_idx[mid] > idx:
      
      21
                              # if it was the last index, may be the child used it
      
      22
                              # as a boundary check.
      
      23
                              continue
      
      24
              last_cmp_only[idx] = mid
      
      25
          return last_cmp_only
      
      executed in 8ms, finished 04:51:44 2019-08-15
      . . .
      1
       
      1
      Here is how one would use it.
      

      Here is how one would use it.

      In [86]:
      xxxxxxxxxx
      
      1
       
      1
      %top calc_last_comparisons1 = last_comparisons(calc_trace[0]['comparisons'])
      
      executed in 7ms, finished 04:51:44 2019-08-15
      . . .
      In [87]:
      xxxxxxxxxx
      
      1
       
      1
      %top calc_last_comparisons1
      
      executed in 7ms, finished 04:51:44 2019-08-15
      Out[87]:
      {0: 6,
       1: 9,
       2: 13,
       3: 18,
       4: 20,
       5: 23,
       6: 28,
       7: 30,
       8: 13,
       9: 35,
       10: 40,
       11: 43,
       12: 48,
       13: 50,
       14: 52}
      
      . . .
      In [88]:
      xxxxxxxxxx
      
      1
       
      1
      %top mathexpr_last_comparisons1 = last_comparisons(mathexpr_trace[0]['comparisons'])
      
      executed in 6ms, finished 04:51:44 2019-08-15
      . . .
      In [89]:
      xxxxxxxxxx
      
      1
       
      1
      %top mathexpr_last_comparisons1
      
      executed in 6ms, finished 04:51:44 2019-08-15
      Out[89]:
      {0: 38, 1: 42, 2: 46}
      
      . . .
      2
       
      1
      #### Attaching characters to the tree
      
      2
      Add the comparison indexes to the method tree that we constructed
      

      1.6.1.2  Attaching characters to the tree¶

      Add the comparison indexes to the method tree that we constructed

      In [90]:
      xxxxxxxxxx
      
      4
       
      1
      def attach_comparisons(method_tree, comparisons):
      
      2
          for idx in comparisons:
      
      3
              mid = comparisons[idx]
      
      4
              method_tree[mid]['indexes'].append(idx)
      
      executed in 4ms, finished 04:51:44 2019-08-15
      . . .
      1
       
      1
      Here is how one would use it. Note which method call each input index is associated. For example, the first index is associated with method call id: 6, which corresponds to `is_digit`.
      

      Here is how one would use it. Note which method call each input index is associated. For example, the first index is associated with method call id: 6, which corresponds to is_digit.

      In [91]:
      xxxxxxxxxx
      
      1
       
      1
      %top attach_comparisons(calc_method_tree1, calc_last_comparisons1)
      
      executed in 4ms, finished 04:51:44 2019-08-15
      . . .
      In [92]:
      xxxxxxxxxx
      
      1
       
      1
      %top calc_method_tree1
      
      executed in 52ms, finished 04:51:44 2019-08-15
      Out[92]:
      {0: {'id': 0,
        'name': None,
        'children': [{'id': 1,
          'name': 'main',
          'indexes': [],
          'children': [{'id': 2,
            'name': 'parse_expr',
            'indexes': [],
            'children': [{'id': 3,
              'name': 'parse_expr:while_1 ? [1]',
              'indexes': [],
              'children': [{'id': 4,
                'name': 'parse_expr:if_1 = 0#[1, -1]',
                'indexes': [],
                'children': [{'id': 5,
                  'name': 'parse_num',
                  'indexes': [],
                  'children': [{'id': 6,
                    'name': 'is_digit',
                    'indexes': [0],
                    'children': []},
                   {'id': 7,
                    'name': 'parse_num:while_1 ? [1]',
                    'indexes': [],
                    'children': []},
                   {'id': 8,
                    'name': 'is_digit',
                    'indexes': [],
                    'children': []}]}]}]},
             {'id': 9,
              'name': 'parse_expr:while_1 ? [2]',
              'indexes': [1],
              'children': [{'id': 10,
                'name': 'parse_expr:if_1 = 1#[2, -1]',
                'indexes': [],
                'children': []}]},
             {'id': 11,
              'name': 'parse_expr:while_1 ? [3]',
              'indexes': [],
              'children': [{'id': 12,
                'name': 'parse_expr:if_1 = 2#[3, -1]',
                'indexes': [],
                'children': [{'id': 13,
                  'name': 'parse_paren',
                  'indexes': [2, 8],
                  'children': [{'id': 14,
                    'name': 'parse_expr',
                    'indexes': [],
                    'children': [{'id': 15,
                      'name': 'parse_expr:while_1 ? [1]',
                      'indexes': [],
                      'children': [{'id': 16,
                        'name': 'parse_expr:if_1 = 0#[1, -1]',
                        'indexes': [],
                        'children': [{'id': 17,
                          'name': 'parse_num',
                          'indexes': [],
                          'children': [{'id': 18,
                            'name': 'is_digit',
                            'indexes': [3],
                            'children': []},
                           {'id': 19,
                            'name': 'parse_num:while_1 ? [1]',
                            'indexes': [],
                            'children': []},
                           {'id': 20,
                            'name': 'is_digit',
                            'indexes': [4],
                            'children': []},
                           {'id': 21,
                            'name': 'parse_num:while_1 ? [2]',
                            'indexes': [],
                            'children': []},
                           {'id': 22,
                            'name': 'is_digit',
                            'indexes': [],
                            'children': []}]}]}]},
                     {'id': 23,
                      'name': 'parse_expr:while_1 ? [2]',
                      'indexes': [5],
                      'children': [{'id': 24,
                        'name': 'parse_expr:if_1 = 1#[2, -1]',
                        'indexes': [],
                        'children': []}]},
                     {'id': 25,
                      'name': 'parse_expr:while_1 ? [3]',
                      'indexes': [],
                      'children': [{'id': 26,
                        'name': 'parse_expr:if_1 = 0#[3, -1]',
                        'indexes': [],
                        'children': [{'id': 27,
                          'name': 'parse_num',
                          'indexes': [],
                          'children': [{'id': 28,
                            'name': 'is_digit',
                            'indexes': [6],
                            'children': []},
                           {'id': 29,
                            'name': 'parse_num:while_1 ? [1]',
                            'indexes': [],
                            'children': []},
                           {'id': 30,
                            'name': 'is_digit',
                            'indexes': [7],
                            'children': []},
                           {'id': 31,
                            'name': 'parse_num:while_1 ? [2]',
                            'indexes': [],
                            'children': []},
                           {'id': 32,
                            'name': 'is_digit',
                            'indexes': [],
                            'children': []}]}]}]},
                     {'id': 33,
                      'name': 'parse_expr:while_1 ? [4]',
                      'indexes': [],
                      'children': [{'id': 34,
                        'name': 'parse_expr:if_1 = 3#[4, -1]',
                        'indexes': [],
                        'children': []}]}]}]}]}]},
             {'id': 35,
              'name': 'parse_expr:while_1 ? [4]',
              'indexes': [9],
              'children': [{'id': 36,
                'name': 'parse_expr:if_1 = 1#[4, -1]',
                'indexes': [],
                'children': []}]},
             {'id': 37,
              'name': 'parse_expr:while_1 ? [5]',
              'indexes': [],
              'children': [{'id': 38,
                'name': 'parse_expr:if_1 = 0#[5, -1]',
                'indexes': [],
                'children': [{'id': 39,
                  'name': 'parse_num',
                  'indexes': [],
                  'children': [{'id': 40,
                    'name': 'is_digit',
                    'indexes': [10],
                    'children': []},
                   {'id': 41,
                    'name': 'parse_num:while_1 ? [1]',
                    'indexes': [],
                    'children': []},
                   {'id': 42,
                    'name': 'is_digit',
                    'indexes': [],
                    'children': []}]}]}]},
             {'id': 43,
              'name': 'parse_expr:while_1 ? [6]',
              'indexes': [11],
              'children': [{'id': 44,
                'name': 'parse_expr:if_1 = 1#[6, -1]',
                'indexes': [],
                'children': []}]},
             {'id': 45,
              'name': 'parse_expr:while_1 ? [7]',
              'indexes': [],
              'children': [{'id': 46,
                'name': 'parse_expr:if_1 = 0#[7, -1]',
                'indexes': [],
                'children': [{'id': 47,
                  'name': 'parse_num',
                  'indexes': [],
                  'children': [{'id': 48,
                    'name': 'is_digit',
                    'indexes': [12],
                    'children': []},
                   {'id': 49,
                    'name': 'parse_num:while_1 ? [1]',
                    'indexes': [],
                    'children': []},
                   {'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
                   {'id': 51,
                    'name': 'parse_num:while_1 ? [2]',
                    'indexes': [],
                    'children': []},
                   {'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
                   {'id': 53,
                    'name': 'parse_num:while_1 ? [3]',
                    'indexes': [],
                    'children': []}]}]}]}]}]}],
        'indexes': []},
       1: {'id': 1,
        'name': 'main',
        'indexes': [],
        'children': [{'id': 2,
          'name': 'parse_expr',
          'indexes': [],
          'children': [{'id': 3,
            'name': 'parse_expr:while_1 ? [1]',
            'indexes': [],
            'children': [{'id': 4,
              'name': 'parse_expr:if_1 = 0#[1, -1]',
              'indexes': [],
              'children': [{'id': 5,
                'name': 'parse_num',
                'indexes': [],
                'children': [{'id': 6,
                  'name': 'is_digit',
                  'indexes': [0],
                  'children': []},
                 {'id': 7,
                  'name': 'parse_num:while_1 ? [1]',
                  'indexes': [],
                  'children': []},
                 {'id': 8, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
           {'id': 9,
            'name': 'parse_expr:while_1 ? [2]',
            'indexes': [1],
            'children': [{'id': 10,
              'name': 'parse_expr:if_1 = 1#[2, -1]',
              'indexes': [],
              'children': []}]},
           {'id': 11,
            'name': 'parse_expr:while_1 ? [3]',
            'indexes': [],
            'children': [{'id': 12,
              'name': 'parse_expr:if_1 = 2#[3, -1]',
              'indexes': [],
              'children': [{'id': 13,
                'name': 'parse_paren',
                'indexes': [2, 8],
                'children': [{'id': 14,
                  'name': 'parse_expr',
                  'indexes': [],
                  'children': [{'id': 15,
                    'name': 'parse_expr:while_1 ? [1]',
                    'indexes': [],
                    'children': [{'id': 16,
                      'name': 'parse_expr:if_1 = 0#[1, -1]',
                      'indexes': [],
                      'children': [{'id': 17,
                        'name': 'parse_num',
                        'indexes': [],
                        'children': [{'id': 18,
                          'name': 'is_digit',
                          'indexes': [3],
                          'children': []},
                         {'id': 19,
                          'name': 'parse_num:while_1 ? [1]',
                          'indexes': [],
                          'children': []},
                         {'id': 20,
                          'name': 'is_digit',
                          'indexes': [4],
                          'children': []},
                         {'id': 21,
                          'name': 'parse_num:while_1 ? [2]',
                          'indexes': [],
                          'children': []},
                         {'id': 22,
                          'name': 'is_digit',
                          'indexes': [],
                          'children': []}]}]}]},
                   {'id': 23,
                    'name': 'parse_expr:while_1 ? [2]',
                    'indexes': [5],
                    'children': [{'id': 24,
                      'name': 'parse_expr:if_1 = 1#[2, -1]',
                      'indexes': [],
                      'children': []}]},
                   {'id': 25,
                    'name': 'parse_expr:while_1 ? [3]',
                    'indexes': [],
                    'children': [{'id': 26,
                      'name': 'parse_expr:if_1 = 0#[3, -1]',
                      'indexes': [],
                      'children': [{'id': 27,
                        'name': 'parse_num',
                        'indexes': [],
                        'children': [{'id': 28,
                          'name': 'is_digit',
                          'indexes': [6],
                          'children': []},
                         {'id': 29,
                          'name': 'parse_num:while_1 ? [1]',
                          'indexes': [],
                          'children': []},
                         {'id': 30,
                          'name': 'is_digit',
                          'indexes': [7],
                          'children': []},
                         {'id': 31,
                          'name': 'parse_num:while_1 ? [2]',
                          'indexes': [],
                          'children': []},
                         {'id': 32,
                          'name': 'is_digit',
                          'indexes': [],
                          'children': []}]}]}]},
                   {'id': 33,
                    'name': 'parse_expr:while_1 ? [4]',
                    'indexes': [],
                    'children': [{'id': 34,
                      'name': 'parse_expr:if_1 = 3#[4, -1]',
                      'indexes': [],
                      'children': []}]}]}]}]}]},
           {'id': 35,
            'name': 'parse_expr:while_1 ? [4]',
            'indexes': [9],
            'children': [{'id': 36,
              'name': 'parse_expr:if_1 = 1#[4, -1]',
              'indexes': [],
              'children': []}]},
           {'id': 37,
            'name': 'parse_expr:while_1 ? [5]',
            'indexes': [],
            'children': [{'id': 38,
              'name': 'parse_expr:if_1 = 0#[5, -1]',
              'indexes': [],
              'children': [{'id': 39,
                'name': 'parse_num',
                'indexes': [],
                'children': [{'id': 40,
                  'name': 'is_digit',
                  'indexes': [10],
                  'children': []},
                 {'id': 41,
                  'name': 'parse_num:while_1 ? [1]',
                  'indexes': [],
                  'children': []},
                 {'id': 42, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
           {'id': 43,
            'name': 'parse_expr:while_1 ? [6]',
            'indexes': [11],
            'children': [{'id': 44,
              'name': 'parse_expr:if_1 = 1#[6, -1]',
              'indexes': [],
              'children': []}]},
           {'id': 45,
            'name': 'parse_expr:while_1 ? [7]',
            'indexes': [],
            'children': [{'id': 46,
              'name': 'parse_expr:if_1 = 0#[7, -1]',
              'indexes': [],
              'children': [{'id': 47,
                'name': 'parse_num',
                'indexes': [],
                'children': [{'id': 48,
                  'name': 'is_digit',
                  'indexes': [12],
                  'children': []},
                 {'id': 49,
                  'name': 'parse_num:while_1 ? [1]',
                  'indexes': [],
                  'children': []},
                 {'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
                 {'id': 51,
                  'name': 'parse_num:while_1 ? [2]',
                  'indexes': [],
                  'children': []},
                 {'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
                 {'id': 53,
                  'name': 'parse_num:while_1 ? [3]',
                  'indexes': [],
                  'children': []}]}]}]}]}]},
       2: {'id': 2,
        'name': 'parse_expr',
        'indexes': [],
        'children': [{'id': 3,
          'name': 'parse_expr:while_1 ? [1]',
          'indexes': [],
          'children': [{'id': 4,
            'name': 'parse_expr:if_1 = 0#[1, -1]',
            'indexes': [],
            'children': [{'id': 5,
              'name': 'parse_num',
              'indexes': [],
              'children': [{'id': 6,
                'name': 'is_digit',
                'indexes': [0],
                'children': []},
               {'id': 7,
                'name': 'parse_num:while_1 ? [1]',
                'indexes': [],
                'children': []},
               {'id': 8, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
         {'id': 9,
          'name': 'parse_expr:while_1 ? [2]',
          'indexes': [1],
          'children': [{'id': 10,
            'name': 'parse_expr:if_1 = 1#[2, -1]',
            'indexes': [],
            'children': []}]},
         {'id': 11,
          'name': 'parse_expr:while_1 ? [3]',
          'indexes': [],
          'children': [{'id': 12,
            'name': 'parse_expr:if_1 = 2#[3, -1]',
            'indexes': [],
            'children': [{'id': 13,
              'name': 'parse_paren',
              'indexes': [2, 8],
              'children': [{'id': 14,
                'name': 'parse_expr',
                'indexes': [],
                'children': [{'id': 15,
                  'name': 'parse_expr:while_1 ? [1]',
                  'indexes': [],
                  'children': [{'id': 16,
                    'name': 'parse_expr:if_1 = 0#[1, -1]',
                    'indexes': [],
                    'children': [{'id': 17,
                      'name': 'parse_num',
                      'indexes': [],
                      'children': [{'id': 18,
                        'name': 'is_digit',
                        'indexes': [3],
                        'children': []},
                       {'id': 19,
                        'name': 'parse_num:while_1 ? [1]',
                        'indexes': [],
                        'children': []},
                       {'id': 20,
                        'name': 'is_digit',
                        'indexes': [4],
                        'children': []},
                       {'id': 21,
                        'name': 'parse_num:while_1 ? [2]',
                        'indexes': [],
                        'children': []},
                       {'id': 22,
                        'name': 'is_digit',
                        'indexes': [],
                        'children': []}]}]}]},
                 {'id': 23,
                  'name': 'parse_expr:while_1 ? [2]',
                  'indexes': [5],
                  'children': [{'id': 24,
                    'name': 'parse_expr:if_1 = 1#[2, -1]',
                    'indexes': [],
                    'children': []}]},
                 {'id': 25,
                  'name': 'parse_expr:while_1 ? [3]',
                  'indexes': [],
                  'children': [{'id': 26,
                    'name': 'parse_expr:if_1 = 0#[3, -1]',
                    'indexes': [],
                    'children': [{'id': 27,
                      'name': 'parse_num',
                      'indexes': [],
                      'children': [{'id': 28,
                        'name': 'is_digit',
                        'indexes': [6],
                        'children': []},
                       {'id': 29,
                        'name': 'parse_num:while_1 ? [1]',
                        'indexes': [],
                        'children': []},
                       {'id': 30,
                        'name': 'is_digit',
                        'indexes': [7],
                        'children': []},
                       {'id': 31,
                        'name': 'parse_num:while_1 ? [2]',
                        'indexes': [],
                        'children': []},
                       {'id': 32,
                        'name': 'is_digit',
                        'indexes': [],
                        'children': []}]}]}]},
                 {'id': 33,
                  'name': 'parse_expr:while_1 ? [4]',
                  'indexes': [],
                  'children': [{'id': 34,
                    'name': 'parse_expr:if_1 = 3#[4, -1]',
                    'indexes': [],
                    'children': []}]}]}]}]}]},
         {'id': 35,
          'name': 'parse_expr:while_1 ? [4]',
          'indexes': [9],
          'children': [{'id': 36,
            'name': 'parse_expr:if_1 = 1#[4, -1]',
            'indexes': [],
            'children': []}]},
         {'id': 37,
          'name': 'parse_expr:while_1 ? [5]',
          'indexes': [],
          'children': [{'id': 38,
            'name': 'parse_expr:if_1 = 0#[5, -1]',
            'indexes': [],
            'children': [{'id': 39,
              'name': 'parse_num',
              'indexes': [],
              'children': [{'id': 40,
                'name': 'is_digit',
                'indexes': [10],
                'children': []},
               {'id': 41,
                'name': 'parse_num:while_1 ? [1]',
                'indexes': [],
                'children': []},
               {'id': 42, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
         {'id': 43,
          'name': 'parse_expr:while_1 ? [6]',
          'indexes': [11],
          'children': [{'id': 44,
            'name': 'parse_expr:if_1 = 1#[6, -1]',
            'indexes': [],
            'children': []}]},
         {'id': 45,
          'name': 'parse_expr:while_1 ? [7]',
          'indexes': [],
          'children': [{'id': 46,
            'name': 'parse_expr:if_1 = 0#[7, -1]',
            'indexes': [],
            'children': [{'id': 47,
              'name': 'parse_num',
              'indexes': [],
              'children': [{'id': 48,
                'name': 'is_digit',
                'indexes': [12],
                'children': []},
               {'id': 49,
                'name': 'parse_num:while_1 ? [1]',
                'indexes': [],
                'children': []},
               {'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
               {'id': 51,
                'name': 'parse_num:while_1 ? [2]',
                'indexes': [],
                'children': []},
               {'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
               {'id': 53,
                'name': 'parse_num:while_1 ? [3]',
                'indexes': [],
                'children': []}]}]}]}]},
       3: {'id': 3,
        'name': 'parse_expr:while_1 ? [1]',
        'indexes': [],
        'children': [{'id': 4,
          'name': 'parse_expr:if_1 = 0#[1, -1]',
          'indexes': [],
          'children': [{'id': 5,
            'name': 'parse_num',
            'indexes': [],
            'children': [{'id': 6,
              'name': 'is_digit',
              'indexes': [0],
              'children': []},
             {'id': 7,
              'name': 'parse_num:while_1 ? [1]',
              'indexes': [],
              'children': []},
             {'id': 8, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
       9: {'id': 9,
        'name': 'parse_expr:while_1 ? [2]',
        'indexes': [1],
        'children': [{'id': 10,
          'name': 'parse_expr:if_1 = 1#[2, -1]',
          'indexes': [],
          'children': []}]},
       11: {'id': 11,
        'name': 'parse_expr:while_1 ? [3]',
        'indexes': [],
        'children': [{'id': 12,
          'name': 'parse_expr:if_1 = 2#[3, -1]',
          'indexes': [],
          'children': [{'id': 13,
            'name': 'parse_paren',
            'indexes': [2, 8],
            'children': [{'id': 14,
              'name': 'parse_expr',
              'indexes': [],
              'children': [{'id': 15,
                'name': 'parse_expr:while_1 ? [1]',
                'indexes': [],
                'children': [{'id': 16,
                  'name': 'parse_expr:if_1 = 0#[1, -1]',
                  'indexes': [],
                  'children': [{'id': 17,
                    'name': 'parse_num',
                    'indexes': [],
                    'children': [{'id': 18,
                      'name': 'is_digit',
                      'indexes': [3],
                      'children': []},
                     {'id': 19,
                      'name': 'parse_num:while_1 ? [1]',
                      'indexes': [],
                      'children': []},
                     {'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
                     {'id': 21,
                      'name': 'parse_num:while_1 ? [2]',
                      'indexes': [],
                      'children': []},
                     {'id': 22,
                      'name': 'is_digit',
                      'indexes': [],
                      'children': []}]}]}]},
               {'id': 23,
                'name': 'parse_expr:while_1 ? [2]',
                'indexes': [5],
                'children': [{'id': 24,
                  'name': 'parse_expr:if_1 = 1#[2, -1]',
                  'indexes': [],
                  'children': []}]},
               {'id': 25,
                'name': 'parse_expr:while_1 ? [3]',
                'indexes': [],
                'children': [{'id': 26,
                  'name': 'parse_expr:if_1 = 0#[3, -1]',
                  'indexes': [],
                  'children': [{'id': 27,
                    'name': 'parse_num',
                    'indexes': [],
                    'children': [{'id': 28,
                      'name': 'is_digit',
                      'indexes': [6],
                      'children': []},
                     {'id': 29,
                      'name': 'parse_num:while_1 ? [1]',
                      'indexes': [],
                      'children': []},
                     {'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
                     {'id': 31,
                      'name': 'parse_num:while_1 ? [2]',
                      'indexes': [],
                      'children': []},
                     {'id': 32,
                      'name': 'is_digit',
                      'indexes': [],
                      'children': []}]}]}]},
               {'id': 33,
                'name': 'parse_expr:while_1 ? [4]',
                'indexes': [],
                'children': [{'id': 34,
                  'name': 'parse_expr:if_1 = 3#[4, -1]',
                  'indexes': [],
                  'children': []}]}]}]}]}]},
       35: {'id': 35,
        'name': 'parse_expr:while_1 ? [4]',
        'indexes': [9],
        'children': [{'id': 36,
          'name': 'parse_expr:if_1 = 1#[4, -1]',
          'indexes': [],
          'children': []}]},
       37: {'id': 37,
        'name': 'parse_expr:while_1 ? [5]',
        'indexes': [],
        'children': [{'id': 38,
          'name': 'parse_expr:if_1 = 0#[5, -1]',
          'indexes': [],
          'children': [{'id': 39,
            'name': 'parse_num',
            'indexes': [],
            'children': [{'id': 40,
              'name': 'is_digit',
              'indexes': [10],
              'children': []},
             {'id': 41,
              'name': 'parse_num:while_1 ? [1]',
              'indexes': [],
              'children': []},
             {'id': 42, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
       43: {'id': 43,
        'name': 'parse_expr:while_1 ? [6]',
        'indexes': [11],
        'children': [{'id': 44,
          'name': 'parse_expr:if_1 = 1#[6, -1]',
          'indexes': [],
          'children': []}]},
       45: {'id': 45,
        'name': 'parse_expr:while_1 ? [7]',
        'indexes': [],
        'children': [{'id': 46,
          'name': 'parse_expr:if_1 = 0#[7, -1]',
          'indexes': [],
          'children': [{'id': 47,
            'name': 'parse_num',
            'indexes': [],
            'children': [{'id': 48,
              'name': 'is_digit',
              'indexes': [12],
              'children': []},
             {'id': 49,
              'name': 'parse_num:while_1 ? [1]',
              'indexes': [],
              'children': []},
             {'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
             {'id': 51,
              'name': 'parse_num:while_1 ? [2]',
              'indexes': [],
              'children': []},
             {'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
             {'id': 53,
              'name': 'parse_num:while_1 ? [3]',
              'indexes': [],
              'children': []}]}]}]},
       4: {'id': 4,
        'name': 'parse_expr:if_1 = 0#[1, -1]',
        'indexes': [],
        'children': [{'id': 5,
          'name': 'parse_num',
          'indexes': [],
          'children': [{'id': 6, 'name': 'is_digit', 'indexes': [0], 'children': []},
           {'id': 7,
            'name': 'parse_num:while_1 ? [1]',
            'indexes': [],
            'children': []},
           {'id': 8, 'name': 'is_digit', 'indexes': [], 'children': []}]}]},
       5: {'id': 5,
        'name': 'parse_num',
        'indexes': [],
        'children': [{'id': 6, 'name': 'is_digit', 'indexes': [0], 'children': []},
         {'id': 7, 'name': 'parse_num:while_1 ? [1]', 'indexes': [], 'children': []},
         {'id': 8, 'name': 'is_digit', 'indexes': [], 'children': []}]},
       6: {'id': 6, 'name': 'is_digit', 'indexes': [0], 'children': []},
       7: {'id': 7,
        'name': 'parse_num:while_1 ? [1]',
        'indexes': [],
        'children': []},
       8: {'id': 8, 'name': 'is_digit', 'indexes': [], 'children': []},
       10: {'id': 10,
        'name': 'parse_expr:if_1 = 1#[2, -1]',
        'indexes': [],
        'children': []},
       12: {'id': 12,
        'name': 'parse_expr:if_1 = 2#[3, -1]',
        'indexes': [],
        'children': [{'id': 13,
          'name': 'parse_paren',
          'indexes': [2, 8],
          'children': [{'id': 14,
            'name': 'parse_expr',
            'indexes': [],
            'children': [{'id': 15,
              'name': 'parse_expr:while_1 ? [1]',
              'indexes': [],
              'children': [{'id': 16,
                'name': 'parse_expr:if_1 = 0#[1, -1]',
                'indexes': [],
                'children': [{'id': 17,
                  'name': 'parse_num',
                  'indexes': [],
                  'children': [{'id': 18,
                    'name': 'is_digit',
                    'indexes': [3],
                    'children': []},
                   {'id': 19,
                    'name': 'parse_num:while_1 ? [1]',
                    'indexes': [],
                    'children': []},
                   {'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
                   {'id': 21,
                    'name': 'parse_num:while_1 ? [2]',
                    'indexes': [],
                    'children': []},
                   {'id': 22,
                    'name': 'is_digit',
                    'indexes': [],
                    'children': []}]}]}]},
             {'id': 23,
              'name': 'parse_expr:while_1 ? [2]',
              'indexes': [5],
              'children': [{'id': 24,
                'name': 'parse_expr:if_1 = 1#[2, -1]',
                'indexes': [],
                'children': []}]},
             {'id': 25,
              'name': 'parse_expr:while_1 ? [3]',
              'indexes': [],
              'children': [{'id': 26,
                'name': 'parse_expr:if_1 = 0#[3, -1]',
                'indexes': [],
                'children': [{'id': 27,
                  'name': 'parse_num',
                  'indexes': [],
                  'children': [{'id': 28,
                    'name': 'is_digit',
                    'indexes': [6],
                    'children': []},
                   {'id': 29,
                    'name': 'parse_num:while_1 ? [1]',
                    'indexes': [],
                    'children': []},
                   {'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
                   {'id': 31,
                    'name': 'parse_num:while_1 ? [2]',
                    'indexes': [],
                    'children': []},
                   {'id': 32,
                    'name': 'is_digit',
                    'indexes': [],
                    'children': []}]}]}]},
             {'id': 33,
              'name': 'parse_expr:while_1 ? [4]',
              'indexes': [],
              'children': [{'id': 34,
                'name': 'parse_expr:if_1 = 3#[4, -1]',
                'indexes': [],
                'children': []}]}]}]}]},
       13: {'id': 13,
        'name': 'parse_paren',
        'indexes': [2, 8],
        'children': [{'id': 14,
          'name': 'parse_expr',
          'indexes': [],
          'children': [{'id': 15,
            'name': 'parse_expr:while_1 ? [1]',
            'indexes': [],
            'children': [{'id': 16,
              'name': 'parse_expr:if_1 = 0#[1, -1]',
              'indexes': [],
              'children': [{'id': 17,
                'name': 'parse_num',
                'indexes': [],
                'children': [{'id': 18,
                  'name': 'is_digit',
                  'indexes': [3],
                  'children': []},
                 {'id': 19,
                  'name': 'parse_num:while_1 ? [1]',
                  'indexes': [],
                  'children': []},
                 {'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
                 {'id': 21,
                  'name': 'parse_num:while_1 ? [2]',
                  'indexes': [],
                  'children': []},
                 {'id': 22, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
           {'id': 23,
            'name': 'parse_expr:while_1 ? [2]',
            'indexes': [5],
            'children': [{'id': 24,
              'name': 'parse_expr:if_1 = 1#[2, -1]',
              'indexes': [],
              'children': []}]},
           {'id': 25,
            'name': 'parse_expr:while_1 ? [3]',
            'indexes': [],
            'children': [{'id': 26,
              'name': 'parse_expr:if_1 = 0#[3, -1]',
              'indexes': [],
              'children': [{'id': 27,
                'name': 'parse_num',
                'indexes': [],
                'children': [{'id': 28,
                  'name': 'is_digit',
                  'indexes': [6],
                  'children': []},
                 {'id': 29,
                  'name': 'parse_num:while_1 ? [1]',
                  'indexes': [],
                  'children': []},
                 {'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
                 {'id': 31,
                  'name': 'parse_num:while_1 ? [2]',
                  'indexes': [],
                  'children': []},
                 {'id': 32, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
           {'id': 33,
            'name': 'parse_expr:while_1 ? [4]',
            'indexes': [],
            'children': [{'id': 34,
              'name': 'parse_expr:if_1 = 3#[4, -1]',
              'indexes': [],
              'children': []}]}]}]},
       14: {'id': 14,
        'name': 'parse_expr',
        'indexes': [],
        'children': [{'id': 15,
          'name': 'parse_expr:while_1 ? [1]',
          'indexes': [],
          'children': [{'id': 16,
            'name': 'parse_expr:if_1 = 0#[1, -1]',
            'indexes': [],
            'children': [{'id': 17,
              'name': 'parse_num',
              'indexes': [],
              'children': [{'id': 18,
                'name': 'is_digit',
                'indexes': [3],
                'children': []},
               {'id': 19,
                'name': 'parse_num:while_1 ? [1]',
                'indexes': [],
                'children': []},
               {'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
               {'id': 21,
                'name': 'parse_num:while_1 ? [2]',
                'indexes': [],
                'children': []},
               {'id': 22, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
         {'id': 23,
          'name': 'parse_expr:while_1 ? [2]',
          'indexes': [5],
          'children': [{'id': 24,
            'name': 'parse_expr:if_1 = 1#[2, -1]',
            'indexes': [],
            'children': []}]},
         {'id': 25,
          'name': 'parse_expr:while_1 ? [3]',
          'indexes': [],
          'children': [{'id': 26,
            'name': 'parse_expr:if_1 = 0#[3, -1]',
            'indexes': [],
            'children': [{'id': 27,
              'name': 'parse_num',
              'indexes': [],
              'children': [{'id': 28,
                'name': 'is_digit',
                'indexes': [6],
                'children': []},
               {'id': 29,
                'name': 'parse_num:while_1 ? [1]',
                'indexes': [],
                'children': []},
               {'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
               {'id': 31,
                'name': 'parse_num:while_1 ? [2]',
                'indexes': [],
                'children': []},
               {'id': 32, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
         {'id': 33,
          'name': 'parse_expr:while_1 ? [4]',
          'indexes': [],
          'children': [{'id': 34,
            'name': 'parse_expr:if_1 = 3#[4, -1]',
            'indexes': [],
            'children': []}]}]},
       15: {'id': 15,
        'name': 'parse_expr:while_1 ? [1]',
        'indexes': [],
        'children': [{'id': 16,
          'name': 'parse_expr:if_1 = 0#[1, -1]',
          'indexes': [],
          'children': [{'id': 17,
            'name': 'parse_num',
            'indexes': [],
            'children': [{'id': 18,
              'name': 'is_digit',
              'indexes': [3],
              'children': []},
             {'id': 19,
              'name': 'parse_num:while_1 ? [1]',
              'indexes': [],
              'children': []},
             {'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
             {'id': 21,
              'name': 'parse_num:while_1 ? [2]',
              'indexes': [],
              'children': []},
             {'id': 22, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
       23: {'id': 23,
        'name': 'parse_expr:while_1 ? [2]',
        'indexes': [5],
        'children': [{'id': 24,
          'name': 'parse_expr:if_1 = 1#[2, -1]',
          'indexes': [],
          'children': []}]},
       25: {'id': 25,
        'name': 'parse_expr:while_1 ? [3]',
        'indexes': [],
        'children': [{'id': 26,
          'name': 'parse_expr:if_1 = 0#[3, -1]',
          'indexes': [],
          'children': [{'id': 27,
            'name': 'parse_num',
            'indexes': [],
            'children': [{'id': 28,
              'name': 'is_digit',
              'indexes': [6],
              'children': []},
             {'id': 29,
              'name': 'parse_num:while_1 ? [1]',
              'indexes': [],
              'children': []},
             {'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
             {'id': 31,
              'name': 'parse_num:while_1 ? [2]',
              'indexes': [],
              'children': []},
             {'id': 32, 'name': 'is_digit', 'indexes': [], 'children': []}]}]}]},
       33: {'id': 33,
        'name': 'parse_expr:while_1 ? [4]',
        'indexes': [],
        'children': [{'id': 34,
          'name': 'parse_expr:if_1 = 3#[4, -1]',
          'indexes': [],
          'children': []}]},
       16: {'id': 16,
        'name': 'parse_expr:if_1 = 0#[1, -1]',
        'indexes': [],
        'children': [{'id': 17,
          'name': 'parse_num',
          'indexes': [],
          'children': [{'id': 18,
            'name': 'is_digit',
            'indexes': [3],
            'children': []},
           {'id': 19,
            'name': 'parse_num:while_1 ? [1]',
            'indexes': [],
            'children': []},
           {'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
           {'id': 21,
            'name': 'parse_num:while_1 ? [2]',
            'indexes': [],
            'children': []},
           {'id': 22, 'name': 'is_digit', 'indexes': [], 'children': []}]}]},
       17: {'id': 17,
        'name': 'parse_num',
        'indexes': [],
        'children': [{'id': 18, 'name': 'is_digit', 'indexes': [3], 'children': []},
         {'id': 19,
          'name': 'parse_num:while_1 ? [1]',
          'indexes': [],
          'children': []},
         {'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
         {'id': 21,
          'name': 'parse_num:while_1 ? [2]',
          'indexes': [],
          'children': []},
         {'id': 22, 'name': 'is_digit', 'indexes': [], 'children': []}]},
       18: {'id': 18, 'name': 'is_digit', 'indexes': [3], 'children': []},
       19: {'id': 19,
        'name': 'parse_num:while_1 ? [1]',
        'indexes': [],
        'children': []},
       20: {'id': 20, 'name': 'is_digit', 'indexes': [4], 'children': []},
       21: {'id': 21,
        'name': 'parse_num:while_1 ? [2]',
        'indexes': [],
        'children': []},
       22: {'id': 22, 'name': 'is_digit', 'indexes': [], 'children': []},
       24: {'id': 24,
        'name': 'parse_expr:if_1 = 1#[2, -1]',
        'indexes': [],
        'children': []},
       26: {'id': 26,
        'name': 'parse_expr:if_1 = 0#[3, -1]',
        'indexes': [],
        'children': [{'id': 27,
          'name': 'parse_num',
          'indexes': [],
          'children': [{'id': 28,
            'name': 'is_digit',
            'indexes': [6],
            'children': []},
           {'id': 29,
            'name': 'parse_num:while_1 ? [1]',
            'indexes': [],
            'children': []},
           {'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
           {'id': 31,
            'name': 'parse_num:while_1 ? [2]',
            'indexes': [],
            'children': []},
           {'id': 32, 'name': 'is_digit', 'indexes': [], 'children': []}]}]},
       27: {'id': 27,
        'name': 'parse_num',
        'indexes': [],
        'children': [{'id': 28, 'name': 'is_digit', 'indexes': [6], 'children': []},
         {'id': 29,
          'name': 'parse_num:while_1 ? [1]',
          'indexes': [],
          'children': []},
         {'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
         {'id': 31,
          'name': 'parse_num:while_1 ? [2]',
          'indexes': [],
          'children': []},
         {'id': 32, 'name': 'is_digit', 'indexes': [], 'children': []}]},
       28: {'id': 28, 'name': 'is_digit', 'indexes': [6], 'children': []},
       29: {'id': 29,
        'name': 'parse_num:while_1 ? [1]',
        'indexes': [],
        'children': []},
       30: {'id': 30, 'name': 'is_digit', 'indexes': [7], 'children': []},
       31: {'id': 31,
        'name': 'parse_num:while_1 ? [2]',
        'indexes': [],
        'children': []},
       32: {'id': 32, 'name': 'is_digit', 'indexes': [], 'children': []},
       34: {'id': 34,
        'name': 'parse_expr:if_1 = 3#[4, -1]',
        'indexes': [],
        'children': []},
       36: {'id': 36,
        'name': 'parse_expr:if_1 = 1#[4, -1]',
        'indexes': [],
        'children': []},
       38: {'id': 38,
        'name': 'parse_expr:if_1 = 0#[5, -1]',
        'indexes': [],
        'children': [{'id': 39,
          'name': 'parse_num',
          'indexes': [],
          'children': [{'id': 40,
            'name': 'is_digit',
            'indexes': [10],
            'children': []},
           {'id': 41,
            'name': 'parse_num:while_1 ? [1]',
            'indexes': [],
            'children': []},
           {'id': 42, 'name': 'is_digit', 'indexes': [], 'children': []}]}]},
       39: {'id': 39,
        'name': 'parse_num',
        'indexes': [],
        'children': [{'id': 40, 'name': 'is_digit', 'indexes': [10], 'children': []},
         {'id': 41,
          'name': 'parse_num:while_1 ? [1]',
          'indexes': [],
          'children': []},
         {'id': 42, 'name': 'is_digit', 'indexes': [], 'children': []}]},
       40: {'id': 40, 'name': 'is_digit', 'indexes': [10], 'children': []},
       41: {'id': 41,
        'name': 'parse_num:while_1 ? [1]',
        'indexes': [],
        'children': []},
       42: {'id': 42, 'name': 'is_digit', 'indexes': [], 'children': []},
       44: {'id': 44,
        'name': 'parse_expr:if_1 = 1#[6, -1]',
        'indexes': [],
        'children': []},
       46: {'id': 46,
        'name': 'parse_expr:if_1 = 0#[7, -1]',
        'indexes': [],
        'children': [{'id': 47,
          'name': 'parse_num',
          'indexes': [],
          'children': [{'id': 48,
            'name': 'is_digit',
            'indexes': [12],
            'children': []},
           {'id': 49,
            'name': 'parse_num:while_1 ? [1]',
            'indexes': [],
            'children': []},
           {'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
           {'id': 51,
            'name': 'parse_num:while_1 ? [2]',
            'indexes': [],
            'children': []},
           {'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
           {'id': 53,
            'name': 'parse_num:while_1 ? [3]',
            'indexes': [],
            'children': []}]}]},
       47: {'id': 47,
        'name': 'parse_num',
        'indexes': [],
        'children': [{'id': 48, 'name': 'is_digit', 'indexes': [12], 'children': []},
         {'id': 49,
          'name': 'parse_num:while_1 ? [1]',
          'indexes': [],
          'children': []},
         {'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
         {'id': 51,
          'name': 'parse_num:while_1 ? [2]',
          'indexes': [],
          'children': []},
         {'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
         {'id': 53,
          'name': 'parse_num:while_1 ? [3]',
          'indexes': [],
          'children': []}]},
       48: {'id': 48, 'name': 'is_digit', 'indexes': [12], 'children': []},
       49: {'id': 49,
        'name': 'parse_num:while_1 ? [1]',
        'indexes': [],
        'children': []},
       50: {'id': 50, 'name': 'is_digit', 'indexes': [13], 'children': []},
       51: {'id': 51,
        'name': 'parse_num:while_1 ? [2]',
        'indexes': [],
        'children': []},
       52: {'id': 52, 'name': 'is_digit', 'indexes': [14], 'children': []},
       53: {'id': 53,
        'name': 'parse_num:while_1 ? [3]',
        'indexes': [],
        'children': []}}
      
      . . .
      In [93]:
      xxxxxxxxxx
      
      1
       
      1
      %top attach_comparisons(mathexpr_method_tree1, mathexpr_last_comparisons1)
      
      executed in 4ms, finished 04:51:44 2019-08-15
      . . .
      In [94]:
      xxxxxxxxxx
      
      1
       
      1
      %top mathexpr_method_tree1
      
      executed in 79ms, finished 04:51:44 2019-08-15
      Out[94]:
      {0: {'id': 0,
        'name': None,
        'children': [{'id': 1,
          'name': 'main',
          'indexes': [],
          'children': [{'id': 2, 'name': '__init__', 'indexes': [], 'children': []},
           {'id': 3,
            'name': 'getValue',
            'indexes': [],
            'children': [{'id': 4,
              'name': 'parseExpression',
              'indexes': [],
              'children': [{'id': 5,
                'name': 'parseAddition',
                'indexes': [],
                'children': [{'id': 6,
                  'name': 'parseMultiplication',
                  'indexes': [],
                  'children': [{'id': 7,
                    'name': 'parseParenthesis',
                    'indexes': [],
                    'children': [{'id': 8,
                      'name': 'skipWhitespace',
                      'indexes': [],
                      'children': [{'id': 9,
                        'name': 'hasNext',
                        'indexes': [],
                        'children': []},
                       {'id': 10,
                        'name': 'skipWhitespace:while_1 ? [1]',
                        'indexes': [],
                        'children': [{'id': 11,
                          'name': 'peek',
                          'indexes': [],
                          'children': []},
                         {'id': 12,
                          'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                          'indexes': [],
                          'children': []}]}]},
                     {'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
                     {'id': 14,
                      'name': 'parseParenthesis:if_1 = 1#[-1]',
                      'indexes': [],
                      'children': [{'id': 15,
                        'name': 'parseNegative',
                        'indexes': [],
                        'children': [{'id': 16,
                          'name': 'skipWhitespace',
                          'indexes': [],
                          'children': [{'id': 17,
                            'name': 'hasNext',
                            'indexes': [],
                            'children': []},
                           {'id': 18,
                            'name': 'skipWhitespace:while_1 ? [1]',
                            'indexes': [],
                            'children': [{'id': 19,
                              'name': 'peek',
                              'indexes': [],
                              'children': []},
                             {'id': 20,
                              'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                              'indexes': [],
                              'children': []}]}]},
                         {'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
                         {'id': 22,
                          'name': 'parseNegative:if_1 = 1#[-1]',
                          'indexes': [],
                          'children': [{'id': 23,
                            'name': 'parseValue',
                            'indexes': [],
                            'children': [{'id': 24,
                              'name': 'skipWhitespace',
                              'indexes': [],
                              'children': [{'id': 25,
                                'name': 'hasNext',
                                'indexes': [],
                                'children': []},
                               {'id': 26,
                                'name': 'skipWhitespace:while_1 ? [1]',
                                'indexes': [],
                                'children': [{'id': 27,
                                  'name': 'peek',
                                  'indexes': [],
                                  'children': []},
                                 {'id': 28,
                                  'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                                  'indexes': [],
                                  'children': []}]}]},
                             {'id': 29,
                              'name': 'peek',
                              'indexes': [],
                              'children': []},
                             {'id': 30,
                              'name': 'parseValue:if_1 = 0#[-1]',
                              'indexes': [],
                              'children': [{'id': 31,
                                'name': 'parseNumber',
                                'indexes': [],
                                'children': [{'id': 32,
                                  'name': 'skipWhitespace',
                                  'indexes': [],
                                  'children': [{'id': 33,
                                    'name': 'hasNext',
                                    'indexes': [],
                                    'children': []},
                                   {'id': 34,
                                    'name': 'skipWhitespace:while_1 ? [1]',
                                    'indexes': [],
                                    'children': [{'id': 35,
                                      'name': 'peek',
                                      'indexes': [],
                                      'children': []},
                                     {'id': 36,
                                      'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                                      'indexes': [],
                                      'children': []}]}]},
                                 {'id': 37,
                                  'name': 'hasNext',
                                  'indexes': [],
                                  'children': []},
                                 {'id': 38,
                                  'name': 'parseNumber:while_1 ? [1]',
                                  'indexes': [0],
                                  'children': [{'id': 39,
                                    'name': 'peek',
                                    'indexes': [],
                                    'children': []},
                                   {'id': 40,
                                    'name': 'parseNumber:if_1 = 1#[1, -1]',
                                    'indexes': [],
                                    'children': []}]},
                                 {'id': 41,
                                  'name': 'hasNext',
                                  'indexes': [],
                                  'children': []},
                                 {'id': 42,
                                  'name': 'parseNumber:while_1 ? [2]',
                                  'indexes': [1],
                                  'children': [{'id': 43,
                                    'name': 'peek',
                                    'indexes': [],
                                    'children': []},
                                   {'id': 44,
                                    'name': 'parseNumber:if_1 = 1#[2, -1]',
                                    'indexes': [],
                                    'children': []}]},
                                 {'id': 45,
                                  'name': 'hasNext',
                                  'indexes': [],
                                  'children': []},
                                 {'id': 46,
                                  'name': 'parseNumber:while_1 ? [3]',
                                  'indexes': [2],
                                  'children': [{'id': 47,
                                    'name': 'peek',
                                    'indexes': [],
                                    'children': []},
                                   {'id': 48,
                                    'name': 'parseNumber:if_1 = 1#[3, -1]',
                                    'indexes': [],
                                    'children': []}]},
                                 {'id': 49,
                                  'name': 'hasNext',
                                  'indexes': [],
                                  'children': []}]}]}]}]}]}]}]},
                   {'id': 50,
                    'name': 'parseMultiplication:while_1 ? [1]',
                    'indexes': [],
                    'children': [{'id': 51,
                      'name': 'skipWhitespace',
                      'indexes': [],
                      'children': [{'id': 52,
                        'name': 'hasNext',
                        'indexes': [],
                        'children': []}]},
                     {'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
                     {'id': 54,
                      'name': 'parseMultiplication:if_1 = 2#[1, -1]',
                      'indexes': [],
                      'children': []}]}]},
                 {'id': 55,
                  'name': 'parseAddition:while_1 ? [1]',
                  'indexes': [],
                  'children': [{'id': 56,
                    'name': 'skipWhitespace',
                    'indexes': [],
                    'children': [{'id': 57,
                      'name': 'hasNext',
                      'indexes': [],
                      'children': []}]},
                   {'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
                   {'id': 59,
                    'name': 'parseAddition:if_1 = 2#[1, -1]',
                    'indexes': [],
                    'children': []}]}]}]},
             {'id': 60,
              'name': 'skipWhitespace',
              'indexes': [],
              'children': [{'id': 61,
                'name': 'hasNext',
                'indexes': [],
                'children': []}]},
             {'id': 62, 'name': 'hasNext', 'indexes': [], 'children': []}]}]}],
        'indexes': []},
       1: {'id': 1,
        'name': 'main',
        'indexes': [],
        'children': [{'id': 2, 'name': '__init__', 'indexes': [], 'children': []},
         {'id': 3,
          'name': 'getValue',
          'indexes': [],
          'children': [{'id': 4,
            'name': 'parseExpression',
            'indexes': [],
            'children': [{'id': 5,
              'name': 'parseAddition',
              'indexes': [],
              'children': [{'id': 6,
                'name': 'parseMultiplication',
                'indexes': [],
                'children': [{'id': 7,
                  'name': 'parseParenthesis',
                  'indexes': [],
                  'children': [{'id': 8,
                    'name': 'skipWhitespace',
                    'indexes': [],
                    'children': [{'id': 9,
                      'name': 'hasNext',
                      'indexes': [],
                      'children': []},
                     {'id': 10,
                      'name': 'skipWhitespace:while_1 ? [1]',
                      'indexes': [],
                      'children': [{'id': 11,
                        'name': 'peek',
                        'indexes': [],
                        'children': []},
                       {'id': 12,
                        'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                        'indexes': [],
                        'children': []}]}]},
                   {'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
                   {'id': 14,
                    'name': 'parseParenthesis:if_1 = 1#[-1]',
                    'indexes': [],
                    'children': [{'id': 15,
                      'name': 'parseNegative',
                      'indexes': [],
                      'children': [{'id': 16,
                        'name': 'skipWhitespace',
                        'indexes': [],
                        'children': [{'id': 17,
                          'name': 'hasNext',
                          'indexes': [],
                          'children': []},
                         {'id': 18,
                          'name': 'skipWhitespace:while_1 ? [1]',
                          'indexes': [],
                          'children': [{'id': 19,
                            'name': 'peek',
                            'indexes': [],
                            'children': []},
                           {'id': 20,
                            'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                            'indexes': [],
                            'children': []}]}]},
                       {'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
                       {'id': 22,
                        'name': 'parseNegative:if_1 = 1#[-1]',
                        'indexes': [],
                        'children': [{'id': 23,
                          'name': 'parseValue',
                          'indexes': [],
                          'children': [{'id': 24,
                            'name': 'skipWhitespace',
                            'indexes': [],
                            'children': [{'id': 25,
                              'name': 'hasNext',
                              'indexes': [],
                              'children': []},
                             {'id': 26,
                              'name': 'skipWhitespace:while_1 ? [1]',
                              'indexes': [],
                              'children': [{'id': 27,
                                'name': 'peek',
                                'indexes': [],
                                'children': []},
                               {'id': 28,
                                'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                                'indexes': [],
                                'children': []}]}]},
                           {'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
                           {'id': 30,
                            'name': 'parseValue:if_1 = 0#[-1]',
                            'indexes': [],
                            'children': [{'id': 31,
                              'name': 'parseNumber',
                              'indexes': [],
                              'children': [{'id': 32,
                                'name': 'skipWhitespace',
                                'indexes': [],
                                'children': [{'id': 33,
                                  'name': 'hasNext',
                                  'indexes': [],
                                  'children': []},
                                 {'id': 34,
                                  'name': 'skipWhitespace:while_1 ? [1]',
                                  'indexes': [],
                                  'children': [{'id': 35,
                                    'name': 'peek',
                                    'indexes': [],
                                    'children': []},
                                   {'id': 36,
                                    'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                                    'indexes': [],
                                    'children': []}]}]},
                               {'id': 37,
                                'name': 'hasNext',
                                'indexes': [],
                                'children': []},
                               {'id': 38,
                                'name': 'parseNumber:while_1 ? [1]',
                                'indexes': [0],
                                'children': [{'id': 39,
                                  'name': 'peek',
                                  'indexes': [],
                                  'children': []},
                                 {'id': 40,
                                  'name': 'parseNumber:if_1 = 1#[1, -1]',
                                  'indexes': [],
                                  'children': []}]},
                               {'id': 41,
                                'name': 'hasNext',
                                'indexes': [],
                                'children': []},
                               {'id': 42,
                                'name': 'parseNumber:while_1 ? [2]',
                                'indexes': [1],
                                'children': [{'id': 43,
                                  'name': 'peek',
                                  'indexes': [],
                                  'children': []},
                                 {'id': 44,
                                  'name': 'parseNumber:if_1 = 1#[2, -1]',
                                  'indexes': [],
                                  'children': []}]},
                               {'id': 45,
                                'name': 'hasNext',
                                'indexes': [],
                                'children': []},
                               {'id': 46,
                                'name': 'parseNumber:while_1 ? [3]',
                                'indexes': [2],
                                'children': [{'id': 47,
                                  'name': 'peek',
                                  'indexes': [],
                                  'children': []},
                                 {'id': 48,
                                  'name': 'parseNumber:if_1 = 1#[3, -1]',
                                  'indexes': [],
                                  'children': []}]},
                               {'id': 49,
                                'name': 'hasNext',
                                'indexes': [],
                                'children': []}]}]}]}]}]}]}]},
                 {'id': 50,
                  'name': 'parseMultiplication:while_1 ? [1]',
                  'indexes': [],
                  'children': [{'id': 51,
                    'name': 'skipWhitespace',
                    'indexes': [],
                    'children': [{'id': 52,
                      'name': 'hasNext',
                      'indexes': [],
                      'children': []}]},
                   {'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
                   {'id': 54,
                    'name': 'parseMultiplication:if_1 = 2#[1, -1]',
                    'indexes': [],
                    'children': []}]}]},
               {'id': 55,
                'name': 'parseAddition:while_1 ? [1]',
                'indexes': [],
                'children': [{'id': 56,
                  'name': 'skipWhitespace',
                  'indexes': [],
                  'children': [{'id': 57,
                    'name': 'hasNext',
                    'indexes': [],
                    'children': []}]},
                 {'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
                 {'id': 59,
                  'name': 'parseAddition:if_1 = 2#[1, -1]',
                  'indexes': [],
                  'children': []}]}]}]},
           {'id': 60,
            'name': 'skipWhitespace',
            'indexes': [],
            'children': [{'id': 61,
              'name': 'hasNext',
              'indexes': [],
              'children': []}]},
           {'id': 62, 'name': 'hasNext', 'indexes': [], 'children': []}]}]},
       2: {'id': 2, 'name': '__init__', 'indexes': [], 'children': []},
       3: {'id': 3,
        'name': 'getValue',
        'indexes': [],
        'children': [{'id': 4,
          'name': 'parseExpression',
          'indexes': [],
          'children': [{'id': 5,
            'name': 'parseAddition',
            'indexes': [],
            'children': [{'id': 6,
              'name': 'parseMultiplication',
              'indexes': [],
              'children': [{'id': 7,
                'name': 'parseParenthesis',
                'indexes': [],
                'children': [{'id': 8,
                  'name': 'skipWhitespace',
                  'indexes': [],
                  'children': [{'id': 9,
                    'name': 'hasNext',
                    'indexes': [],
                    'children': []},
                   {'id': 10,
                    'name': 'skipWhitespace:while_1 ? [1]',
                    'indexes': [],
                    'children': [{'id': 11,
                      'name': 'peek',
                      'indexes': [],
                      'children': []},
                     {'id': 12,
                      'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                      'indexes': [],
                      'children': []}]}]},
                 {'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
                 {'id': 14,
                  'name': 'parseParenthesis:if_1 = 1#[-1]',
                  'indexes': [],
                  'children': [{'id': 15,
                    'name': 'parseNegative',
                    'indexes': [],
                    'children': [{'id': 16,
                      'name': 'skipWhitespace',
                      'indexes': [],
                      'children': [{'id': 17,
                        'name': 'hasNext',
                        'indexes': [],
                        'children': []},
                       {'id': 18,
                        'name': 'skipWhitespace:while_1 ? [1]',
                        'indexes': [],
                        'children': [{'id': 19,
                          'name': 'peek',
                          'indexes': [],
                          'children': []},
                         {'id': 20,
                          'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                          'indexes': [],
                          'children': []}]}]},
                     {'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
                     {'id': 22,
                      'name': 'parseNegative:if_1 = 1#[-1]',
                      'indexes': [],
                      'children': [{'id': 23,
                        'name': 'parseValue',
                        'indexes': [],
                        'children': [{'id': 24,
                          'name': 'skipWhitespace',
                          'indexes': [],
                          'children': [{'id': 25,
                            'name': 'hasNext',
                            'indexes': [],
                            'children': []},
                           {'id': 26,
                            'name': 'skipWhitespace:while_1 ? [1]',
                            'indexes': [],
                            'children': [{'id': 27,
                              'name': 'peek',
                              'indexes': [],
                              'children': []},
                             {'id': 28,
                              'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                              'indexes': [],
                              'children': []}]}]},
                         {'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
                         {'id': 30,
                          'name': 'parseValue:if_1 = 0#[-1]',
                          'indexes': [],
                          'children': [{'id': 31,
                            'name': 'parseNumber',
                            'indexes': [],
                            'children': [{'id': 32,
                              'name': 'skipWhitespace',
                              'indexes': [],
                              'children': [{'id': 33,
                                'name': 'hasNext',
                                'indexes': [],
                                'children': []},
                               {'id': 34,
                                'name': 'skipWhitespace:while_1 ? [1]',
                                'indexes': [],
                                'children': [{'id': 35,
                                  'name': 'peek',
                                  'indexes': [],
                                  'children': []},
                                 {'id': 36,
                                  'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                                  'indexes': [],
                                  'children': []}]}]},
                             {'id': 37,
                              'name': 'hasNext',
                              'indexes': [],
                              'children': []},
                             {'id': 38,
                              'name': 'parseNumber:while_1 ? [1]',
                              'indexes': [0],
                              'children': [{'id': 39,
                                'name': 'peek',
                                'indexes': [],
                                'children': []},
                               {'id': 40,
                                'name': 'parseNumber:if_1 = 1#[1, -1]',
                                'indexes': [],
                                'children': []}]},
                             {'id': 41,
                              'name': 'hasNext',
                              'indexes': [],
                              'children': []},
                             {'id': 42,
                              'name': 'parseNumber:while_1 ? [2]',
                              'indexes': [1],
                              'children': [{'id': 43,
                                'name': 'peek',
                                'indexes': [],
                                'children': []},
                               {'id': 44,
                                'name': 'parseNumber:if_1 = 1#[2, -1]',
                                'indexes': [],
                                'children': []}]},
                             {'id': 45,
                              'name': 'hasNext',
                              'indexes': [],
                              'children': []},
                             {'id': 46,
                              'name': 'parseNumber:while_1 ? [3]',
                              'indexes': [2],
                              'children': [{'id': 47,
                                'name': 'peek',
                                'indexes': [],
                                'children': []},
                               {'id': 48,
                                'name': 'parseNumber:if_1 = 1#[3, -1]',
                                'indexes': [],
                                'children': []}]},
                             {'id': 49,
                              'name': 'hasNext',
                              'indexes': [],
                              'children': []}]}]}]}]}]}]}]},
               {'id': 50,
                'name': 'parseMultiplication:while_1 ? [1]',
                'indexes': [],
                'children': [{'id': 51,
                  'name': 'skipWhitespace',
                  'indexes': [],
                  'children': [{'id': 52,
                    'name': 'hasNext',
                    'indexes': [],
                    'children': []}]},
                 {'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
                 {'id': 54,
                  'name': 'parseMultiplication:if_1 = 2#[1, -1]',
                  'indexes': [],
                  'children': []}]}]},
             {'id': 55,
              'name': 'parseAddition:while_1 ? [1]',
              'indexes': [],
              'children': [{'id': 56,
                'name': 'skipWhitespace',
                'indexes': [],
                'children': [{'id': 57,
                  'name': 'hasNext',
                  'indexes': [],
                  'children': []}]},
               {'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
               {'id': 59,
                'name': 'parseAddition:if_1 = 2#[1, -1]',
                'indexes': [],
                'children': []}]}]}]},
         {'id': 60,
          'name': 'skipWhitespace',
          'indexes': [],
          'children': [{'id': 61,
            'name': 'hasNext',
            'indexes': [],
            'children': []}]},
         {'id': 62, 'name': 'hasNext', 'indexes': [], 'children': []}]},
       4: {'id': 4,
        'name': 'parseExpression',
        'indexes': [],
        'children': [{'id': 5,
          'name': 'parseAddition',
          'indexes': [],
          'children': [{'id': 6,
            'name': 'parseMultiplication',
            'indexes': [],
            'children': [{'id': 7,
              'name': 'parseParenthesis',
              'indexes': [],
              'children': [{'id': 8,
                'name': 'skipWhitespace',
                'indexes': [],
                'children': [{'id': 9,
                  'name': 'hasNext',
                  'indexes': [],
                  'children': []},
                 {'id': 10,
                  'name': 'skipWhitespace:while_1 ? [1]',
                  'indexes': [],
                  'children': [{'id': 11,
                    'name': 'peek',
                    'indexes': [],
                    'children': []},
                   {'id': 12,
                    'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                    'indexes': [],
                    'children': []}]}]},
               {'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
               {'id': 14,
                'name': 'parseParenthesis:if_1 = 1#[-1]',
                'indexes': [],
                'children': [{'id': 15,
                  'name': 'parseNegative',
                  'indexes': [],
                  'children': [{'id': 16,
                    'name': 'skipWhitespace',
                    'indexes': [],
                    'children': [{'id': 17,
                      'name': 'hasNext',
                      'indexes': [],
                      'children': []},
                     {'id': 18,
                      'name': 'skipWhitespace:while_1 ? [1]',
                      'indexes': [],
                      'children': [{'id': 19,
                        'name': 'peek',
                        'indexes': [],
                        'children': []},
                       {'id': 20,
                        'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                        'indexes': [],
                        'children': []}]}]},
                   {'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
                   {'id': 22,
                    'name': 'parseNegative:if_1 = 1#[-1]',
                    'indexes': [],
                    'children': [{'id': 23,
                      'name': 'parseValue',
                      'indexes': [],
                      'children': [{'id': 24,
                        'name': 'skipWhitespace',
                        'indexes': [],
                        'children': [{'id': 25,
                          'name': 'hasNext',
                          'indexes': [],
                          'children': []},
                         {'id': 26,
                          'name': 'skipWhitespace:while_1 ? [1]',
                          'indexes': [],
                          'children': [{'id': 27,
                            'name': 'peek',
                            'indexes': [],
                            'children': []},
                           {'id': 28,
                            'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                            'indexes': [],
                            'children': []}]}]},
                       {'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
                       {'id': 30,
                        'name': 'parseValue:if_1 = 0#[-1]',
                        'indexes': [],
                        'children': [{'id': 31,
                          'name': 'parseNumber',
                          'indexes': [],
                          'children': [{'id': 32,
                            'name': 'skipWhitespace',
                            'indexes': [],
                            'children': [{'id': 33,
                              'name': 'hasNext',
                              'indexes': [],
                              'children': []},
                             {'id': 34,
                              'name': 'skipWhitespace:while_1 ? [1]',
                              'indexes': [],
                              'children': [{'id': 35,
                                'name': 'peek',
                                'indexes': [],
                                'children': []},
                               {'id': 36,
                                'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                                'indexes': [],
                                'children': []}]}]},
                           {'id': 37,
                            'name': 'hasNext',
                            'indexes': [],
                            'children': []},
                           {'id': 38,
                            'name': 'parseNumber:while_1 ? [1]',
                            'indexes': [0],
                            'children': [{'id': 39,
                              'name': 'peek',
                              'indexes': [],
                              'children': []},
                             {'id': 40,
                              'name': 'parseNumber:if_1 = 1#[1, -1]',
                              'indexes': [],
                              'children': []}]},
                           {'id': 41,
                            'name': 'hasNext',
                            'indexes': [],
                            'children': []},
                           {'id': 42,
                            'name': 'parseNumber:while_1 ? [2]',
                            'indexes': [1],
                            'children': [{'id': 43,
                              'name': 'peek',
                              'indexes': [],
                              'children': []},
                             {'id': 44,
                              'name': 'parseNumber:if_1 = 1#[2, -1]',
                              'indexes': [],
                              'children': []}]},
                           {'id': 45,
                            'name': 'hasNext',
                            'indexes': [],
                            'children': []},
                           {'id': 46,
                            'name': 'parseNumber:while_1 ? [3]',
                            'indexes': [2],
                            'children': [{'id': 47,
                              'name': 'peek',
                              'indexes': [],
                              'children': []},
                             {'id': 48,
                              'name': 'parseNumber:if_1 = 1#[3, -1]',
                              'indexes': [],
                              'children': []}]},
                           {'id': 49,
                            'name': 'hasNext',
                            'indexes': [],
                            'children': []}]}]}]}]}]}]}]},
             {'id': 50,
              'name': 'parseMultiplication:while_1 ? [1]',
              'indexes': [],
              'children': [{'id': 51,
                'name': 'skipWhitespace',
                'indexes': [],
                'children': [{'id': 52,
                  'name': 'hasNext',
                  'indexes': [],
                  'children': []}]},
               {'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
               {'id': 54,
                'name': 'parseMultiplication:if_1 = 2#[1, -1]',
                'indexes': [],
                'children': []}]}]},
           {'id': 55,
            'name': 'parseAddition:while_1 ? [1]',
            'indexes': [],
            'children': [{'id': 56,
              'name': 'skipWhitespace',
              'indexes': [],
              'children': [{'id': 57,
                'name': 'hasNext',
                'indexes': [],
                'children': []}]},
             {'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
             {'id': 59,
              'name': 'parseAddition:if_1 = 2#[1, -1]',
              'indexes': [],
              'children': []}]}]}]},
       60: {'id': 60,
        'name': 'skipWhitespace',
        'indexes': [],
        'children': [{'id': 61, 'name': 'hasNext', 'indexes': [], 'children': []}]},
       62: {'id': 62, 'name': 'hasNext', 'indexes': [], 'children': []},
       5: {'id': 5,
        'name': 'parseAddition',
        'indexes': [],
        'children': [{'id': 6,
          'name': 'parseMultiplication',
          'indexes': [],
          'children': [{'id': 7,
            'name': 'parseParenthesis',
            'indexes': [],
            'children': [{'id': 8,
              'name': 'skipWhitespace',
              'indexes': [],
              'children': [{'id': 9,
                'name': 'hasNext',
                'indexes': [],
                'children': []},
               {'id': 10,
                'name': 'skipWhitespace:while_1 ? [1]',
                'indexes': [],
                'children': [{'id': 11,
                  'name': 'peek',
                  'indexes': [],
                  'children': []},
                 {'id': 12,
                  'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                  'indexes': [],
                  'children': []}]}]},
             {'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
             {'id': 14,
              'name': 'parseParenthesis:if_1 = 1#[-1]',
              'indexes': [],
              'children': [{'id': 15,
                'name': 'parseNegative',
                'indexes': [],
                'children': [{'id': 16,
                  'name': 'skipWhitespace',
                  'indexes': [],
                  'children': [{'id': 17,
                    'name': 'hasNext',
                    'indexes': [],
                    'children': []},
                   {'id': 18,
                    'name': 'skipWhitespace:while_1 ? [1]',
                    'indexes': [],
                    'children': [{'id': 19,
                      'name': 'peek',
                      'indexes': [],
                      'children': []},
                     {'id': 20,
                      'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                      'indexes': [],
                      'children': []}]}]},
                 {'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
                 {'id': 22,
                  'name': 'parseNegative:if_1 = 1#[-1]',
                  'indexes': [],
                  'children': [{'id': 23,
                    'name': 'parseValue',
                    'indexes': [],
                    'children': [{'id': 24,
                      'name': 'skipWhitespace',
                      'indexes': [],
                      'children': [{'id': 25,
                        'name': 'hasNext',
                        'indexes': [],
                        'children': []},
                       {'id': 26,
                        'name': 'skipWhitespace:while_1 ? [1]',
                        'indexes': [],
                        'children': [{'id': 27,
                          'name': 'peek',
                          'indexes': [],
                          'children': []},
                         {'id': 28,
                          'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                          'indexes': [],
                          'children': []}]}]},
                     {'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
                     {'id': 30,
                      'name': 'parseValue:if_1 = 0#[-1]',
                      'indexes': [],
                      'children': [{'id': 31,
                        'name': 'parseNumber',
                        'indexes': [],
                        'children': [{'id': 32,
                          'name': 'skipWhitespace',
                          'indexes': [],
                          'children': [{'id': 33,
                            'name': 'hasNext',
                            'indexes': [],
                            'children': []},
                           {'id': 34,
                            'name': 'skipWhitespace:while_1 ? [1]',
                            'indexes': [],
                            'children': [{'id': 35,
                              'name': 'peek',
                              'indexes': [],
                              'children': []},
                             {'id': 36,
                              'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                              'indexes': [],
                              'children': []}]}]},
                         {'id': 37,
                          'name': 'hasNext',
                          'indexes': [],
                          'children': []},
                         {'id': 38,
                          'name': 'parseNumber:while_1 ? [1]',
                          'indexes': [0],
                          'children': [{'id': 39,
                            'name': 'peek',
                            'indexes': [],
                            'children': []},
                           {'id': 40,
                            'name': 'parseNumber:if_1 = 1#[1, -1]',
                            'indexes': [],
                            'children': []}]},
                         {'id': 41,
                          'name': 'hasNext',
                          'indexes': [],
                          'children': []},
                         {'id': 42,
                          'name': 'parseNumber:while_1 ? [2]',
                          'indexes': [1],
                          'children': [{'id': 43,
                            'name': 'peek',
                            'indexes': [],
                            'children': []},
                           {'id': 44,
                            'name': 'parseNumber:if_1 = 1#[2, -1]',
                            'indexes': [],
                            'children': []}]},
                         {'id': 45,
                          'name': 'hasNext',
                          'indexes': [],
                          'children': []},
                         {'id': 46,
                          'name': 'parseNumber:while_1 ? [3]',
                          'indexes': [2],
                          'children': [{'id': 47,
                            'name': 'peek',
                            'indexes': [],
                            'children': []},
                           {'id': 48,
                            'name': 'parseNumber:if_1 = 1#[3, -1]',
                            'indexes': [],
                            'children': []}]},
                         {'id': 49,
                          'name': 'hasNext',
                          'indexes': [],
                          'children': []}]}]}]}]}]}]}]},
           {'id': 50,
            'name': 'parseMultiplication:while_1 ? [1]',
            'indexes': [],
            'children': [{'id': 51,
              'name': 'skipWhitespace',
              'indexes': [],
              'children': [{'id': 52,
                'name': 'hasNext',
                'indexes': [],
                'children': []}]},
             {'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
             {'id': 54,
              'name': 'parseMultiplication:if_1 = 2#[1, -1]',
              'indexes': [],
              'children': []}]}]},
         {'id': 55,
          'name': 'parseAddition:while_1 ? [1]',
          'indexes': [],
          'children': [{'id': 56,
            'name': 'skipWhitespace',
            'indexes': [],
            'children': [{'id': 57,
              'name': 'hasNext',
              'indexes': [],
              'children': []}]},
           {'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
           {'id': 59,
            'name': 'parseAddition:if_1 = 2#[1, -1]',
            'indexes': [],
            'children': []}]}]},
       6: {'id': 6,
        'name': 'parseMultiplication',
        'indexes': [],
        'children': [{'id': 7,
          'name': 'parseParenthesis',
          'indexes': [],
          'children': [{'id': 8,
            'name': 'skipWhitespace',
            'indexes': [],
            'children': [{'id': 9, 'name': 'hasNext', 'indexes': [], 'children': []},
             {'id': 10,
              'name': 'skipWhitespace:while_1 ? [1]',
              'indexes': [],
              'children': [{'id': 11, 'name': 'peek', 'indexes': [], 'children': []},
               {'id': 12,
                'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                'indexes': [],
                'children': []}]}]},
           {'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
           {'id': 14,
            'name': 'parseParenthesis:if_1 = 1#[-1]',
            'indexes': [],
            'children': [{'id': 15,
              'name': 'parseNegative',
              'indexes': [],
              'children': [{'id': 16,
                'name': 'skipWhitespace',
                'indexes': [],
                'children': [{'id': 17,
                  'name': 'hasNext',
                  'indexes': [],
                  'children': []},
                 {'id': 18,
                  'name': 'skipWhitespace:while_1 ? [1]',
                  'indexes': [],
                  'children': [{'id': 19,
                    'name': 'peek',
                    'indexes': [],
                    'children': []},
                   {'id': 20,
                    'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                    'indexes': [],
                    'children': []}]}]},
               {'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
               {'id': 22,
                'name': 'parseNegative:if_1 = 1#[-1]',
                'indexes': [],
                'children': [{'id': 23,
                  'name': 'parseValue',
                  'indexes': [],
                  'children': [{'id': 24,
                    'name': 'skipWhitespace',
                    'indexes': [],
                    'children': [{'id': 25,
                      'name': 'hasNext',
                      'indexes': [],
                      'children': []},
                     {'id': 26,
                      'name': 'skipWhitespace:while_1 ? [1]',
                      'indexes': [],
                      'children': [{'id': 27,
                        'name': 'peek',
                        'indexes': [],
                        'children': []},
                       {'id': 28,
                        'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                        'indexes': [],
                        'children': []}]}]},
                   {'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
                   {'id': 30,
                    'name': 'parseValue:if_1 = 0#[-1]',
                    'indexes': [],
                    'children': [{'id': 31,
                      'name': 'parseNumber',
                      'indexes': [],
                      'children': [{'id': 32,
                        'name': 'skipWhitespace',
                        'indexes': [],
                        'children': [{'id': 33,
                          'name': 'hasNext',
                          'indexes': [],
                          'children': []},
                         {'id': 34,
                          'name': 'skipWhitespace:while_1 ? [1]',
                          'indexes': [],
                          'children': [{'id': 35,
                            'name': 'peek',
                            'indexes': [],
                            'children': []},
                           {'id': 36,
                            'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                            'indexes': [],
                            'children': []}]}]},
                       {'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
                       {'id': 38,
                        'name': 'parseNumber:while_1 ? [1]',
                        'indexes': [0],
                        'children': [{'id': 39,
                          'name': 'peek',
                          'indexes': [],
                          'children': []},
                         {'id': 40,
                          'name': 'parseNumber:if_1 = 1#[1, -1]',
                          'indexes': [],
                          'children': []}]},
                       {'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
                       {'id': 42,
                        'name': 'parseNumber:while_1 ? [2]',
                        'indexes': [1],
                        'children': [{'id': 43,
                          'name': 'peek',
                          'indexes': [],
                          'children': []},
                         {'id': 44,
                          'name': 'parseNumber:if_1 = 1#[2, -1]',
                          'indexes': [],
                          'children': []}]},
                       {'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
                       {'id': 46,
                        'name': 'parseNumber:while_1 ? [3]',
                        'indexes': [2],
                        'children': [{'id': 47,
                          'name': 'peek',
                          'indexes': [],
                          'children': []},
                         {'id': 48,
                          'name': 'parseNumber:if_1 = 1#[3, -1]',
                          'indexes': [],
                          'children': []}]},
                       {'id': 49,
                        'name': 'hasNext',
                        'indexes': [],
                        'children': []}]}]}]}]}]}]}]},
         {'id': 50,
          'name': 'parseMultiplication:while_1 ? [1]',
          'indexes': [],
          'children': [{'id': 51,
            'name': 'skipWhitespace',
            'indexes': [],
            'children': [{'id': 52,
              'name': 'hasNext',
              'indexes': [],
              'children': []}]},
           {'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
           {'id': 54,
            'name': 'parseMultiplication:if_1 = 2#[1, -1]',
            'indexes': [],
            'children': []}]}]},
       55: {'id': 55,
        'name': 'parseAddition:while_1 ? [1]',
        'indexes': [],
        'children': [{'id': 56,
          'name': 'skipWhitespace',
          'indexes': [],
          'children': [{'id': 57,
            'name': 'hasNext',
            'indexes': [],
            'children': []}]},
         {'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
         {'id': 59,
          'name': 'parseAddition:if_1 = 2#[1, -1]',
          'indexes': [],
          'children': []}]},
       7: {'id': 7,
        'name': 'parseParenthesis',
        'indexes': [],
        'children': [{'id': 8,
          'name': 'skipWhitespace',
          'indexes': [],
          'children': [{'id': 9, 'name': 'hasNext', 'indexes': [], 'children': []},
           {'id': 10,
            'name': 'skipWhitespace:while_1 ? [1]',
            'indexes': [],
            'children': [{'id': 11, 'name': 'peek', 'indexes': [], 'children': []},
             {'id': 12,
              'name': 'skipWhitespace:if_1 = 1#[1, -1]',
              'indexes': [],
              'children': []}]}]},
         {'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
         {'id': 14,
          'name': 'parseParenthesis:if_1 = 1#[-1]',
          'indexes': [],
          'children': [{'id': 15,
            'name': 'parseNegative',
            'indexes': [],
            'children': [{'id': 16,
              'name': 'skipWhitespace',
              'indexes': [],
              'children': [{'id': 17,
                'name': 'hasNext',
                'indexes': [],
                'children': []},
               {'id': 18,
                'name': 'skipWhitespace:while_1 ? [1]',
                'indexes': [],
                'children': [{'id': 19,
                  'name': 'peek',
                  'indexes': [],
                  'children': []},
                 {'id': 20,
                  'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                  'indexes': [],
                  'children': []}]}]},
             {'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
             {'id': 22,
              'name': 'parseNegative:if_1 = 1#[-1]',
              'indexes': [],
              'children': [{'id': 23,
                'name': 'parseValue',
                'indexes': [],
                'children': [{'id': 24,
                  'name': 'skipWhitespace',
                  'indexes': [],
                  'children': [{'id': 25,
                    'name': 'hasNext',
                    'indexes': [],
                    'children': []},
                   {'id': 26,
                    'name': 'skipWhitespace:while_1 ? [1]',
                    'indexes': [],
                    'children': [{'id': 27,
                      'name': 'peek',
                      'indexes': [],
                      'children': []},
                     {'id': 28,
                      'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                      'indexes': [],
                      'children': []}]}]},
                 {'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
                 {'id': 30,
                  'name': 'parseValue:if_1 = 0#[-1]',
                  'indexes': [],
                  'children': [{'id': 31,
                    'name': 'parseNumber',
                    'indexes': [],
                    'children': [{'id': 32,
                      'name': 'skipWhitespace',
                      'indexes': [],
                      'children': [{'id': 33,
                        'name': 'hasNext',
                        'indexes': [],
                        'children': []},
                       {'id': 34,
                        'name': 'skipWhitespace:while_1 ? [1]',
                        'indexes': [],
                        'children': [{'id': 35,
                          'name': 'peek',
                          'indexes': [],
                          'children': []},
                         {'id': 36,
                          'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                          'indexes': [],
                          'children': []}]}]},
                     {'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
                     {'id': 38,
                      'name': 'parseNumber:while_1 ? [1]',
                      'indexes': [0],
                      'children': [{'id': 39,
                        'name': 'peek',
                        'indexes': [],
                        'children': []},
                       {'id': 40,
                        'name': 'parseNumber:if_1 = 1#[1, -1]',
                        'indexes': [],
                        'children': []}]},
                     {'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
                     {'id': 42,
                      'name': 'parseNumber:while_1 ? [2]',
                      'indexes': [1],
                      'children': [{'id': 43,
                        'name': 'peek',
                        'indexes': [],
                        'children': []},
                       {'id': 44,
                        'name': 'parseNumber:if_1 = 1#[2, -1]',
                        'indexes': [],
                        'children': []}]},
                     {'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
                     {'id': 46,
                      'name': 'parseNumber:while_1 ? [3]',
                      'indexes': [2],
                      'children': [{'id': 47,
                        'name': 'peek',
                        'indexes': [],
                        'children': []},
                       {'id': 48,
                        'name': 'parseNumber:if_1 = 1#[3, -1]',
                        'indexes': [],
                        'children': []}]},
                     {'id': 49,
                      'name': 'hasNext',
                      'indexes': [],
                      'children': []}]}]}]}]}]}]}]},
       50: {'id': 50,
        'name': 'parseMultiplication:while_1 ? [1]',
        'indexes': [],
        'children': [{'id': 51,
          'name': 'skipWhitespace',
          'indexes': [],
          'children': [{'id': 52,
            'name': 'hasNext',
            'indexes': [],
            'children': []}]},
         {'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
         {'id': 54,
          'name': 'parseMultiplication:if_1 = 2#[1, -1]',
          'indexes': [],
          'children': []}]},
       8: {'id': 8,
        'name': 'skipWhitespace',
        'indexes': [],
        'children': [{'id': 9, 'name': 'hasNext', 'indexes': [], 'children': []},
         {'id': 10,
          'name': 'skipWhitespace:while_1 ? [1]',
          'indexes': [],
          'children': [{'id': 11, 'name': 'peek', 'indexes': [], 'children': []},
           {'id': 12,
            'name': 'skipWhitespace:if_1 = 1#[1, -1]',
            'indexes': [],
            'children': []}]}]},
       13: {'id': 13, 'name': 'peek', 'indexes': [], 'children': []},
       14: {'id': 14,
        'name': 'parseParenthesis:if_1 = 1#[-1]',
        'indexes': [],
        'children': [{'id': 15,
          'name': 'parseNegative',
          'indexes': [],
          'children': [{'id': 16,
            'name': 'skipWhitespace',
            'indexes': [],
            'children': [{'id': 17,
              'name': 'hasNext',
              'indexes': [],
              'children': []},
             {'id': 18,
              'name': 'skipWhitespace:while_1 ? [1]',
              'indexes': [],
              'children': [{'id': 19, 'name': 'peek', 'indexes': [], 'children': []},
               {'id': 20,
                'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                'indexes': [],
                'children': []}]}]},
           {'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
           {'id': 22,
            'name': 'parseNegative:if_1 = 1#[-1]',
            'indexes': [],
            'children': [{'id': 23,
              'name': 'parseValue',
              'indexes': [],
              'children': [{'id': 24,
                'name': 'skipWhitespace',
                'indexes': [],
                'children': [{'id': 25,
                  'name': 'hasNext',
                  'indexes': [],
                  'children': []},
                 {'id': 26,
                  'name': 'skipWhitespace:while_1 ? [1]',
                  'indexes': [],
                  'children': [{'id': 27,
                    'name': 'peek',
                    'indexes': [],
                    'children': []},
                   {'id': 28,
                    'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                    'indexes': [],
                    'children': []}]}]},
               {'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
               {'id': 30,
                'name': 'parseValue:if_1 = 0#[-1]',
                'indexes': [],
                'children': [{'id': 31,
                  'name': 'parseNumber',
                  'indexes': [],
                  'children': [{'id': 32,
                    'name': 'skipWhitespace',
                    'indexes': [],
                    'children': [{'id': 33,
                      'name': 'hasNext',
                      'indexes': [],
                      'children': []},
                     {'id': 34,
                      'name': 'skipWhitespace:while_1 ? [1]',
                      'indexes': [],
                      'children': [{'id': 35,
                        'name': 'peek',
                        'indexes': [],
                        'children': []},
                       {'id': 36,
                        'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                        'indexes': [],
                        'children': []}]}]},
                   {'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
                   {'id': 38,
                    'name': 'parseNumber:while_1 ? [1]',
                    'indexes': [0],
                    'children': [{'id': 39,
                      'name': 'peek',
                      'indexes': [],
                      'children': []},
                     {'id': 40,
                      'name': 'parseNumber:if_1 = 1#[1, -1]',
                      'indexes': [],
                      'children': []}]},
                   {'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
                   {'id': 42,
                    'name': 'parseNumber:while_1 ? [2]',
                    'indexes': [1],
                    'children': [{'id': 43,
                      'name': 'peek',
                      'indexes': [],
                      'children': []},
                     {'id': 44,
                      'name': 'parseNumber:if_1 = 1#[2, -1]',
                      'indexes': [],
                      'children': []}]},
                   {'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
                   {'id': 46,
                    'name': 'parseNumber:while_1 ? [3]',
                    'indexes': [2],
                    'children': [{'id': 47,
                      'name': 'peek',
                      'indexes': [],
                      'children': []},
                     {'id': 48,
                      'name': 'parseNumber:if_1 = 1#[3, -1]',
                      'indexes': [],
                      'children': []}]},
                   {'id': 49,
                    'name': 'hasNext',
                    'indexes': [],
                    'children': []}]}]}]}]}]}]},
       9: {'id': 9, 'name': 'hasNext', 'indexes': [], 'children': []},
       10: {'id': 10,
        'name': 'skipWhitespace:while_1 ? [1]',
        'indexes': [],
        'children': [{'id': 11, 'name': 'peek', 'indexes': [], 'children': []},
         {'id': 12,
          'name': 'skipWhitespace:if_1 = 1#[1, -1]',
          'indexes': [],
          'children': []}]},
       11: {'id': 11, 'name': 'peek', 'indexes': [], 'children': []},
       12: {'id': 12,
        'name': 'skipWhitespace:if_1 = 1#[1, -1]',
        'indexes': [],
        'children': []},
       15: {'id': 15,
        'name': 'parseNegative',
        'indexes': [],
        'children': [{'id': 16,
          'name': 'skipWhitespace',
          'indexes': [],
          'children': [{'id': 17, 'name': 'hasNext', 'indexes': [], 'children': []},
           {'id': 18,
            'name': 'skipWhitespace:while_1 ? [1]',
            'indexes': [],
            'children': [{'id': 19, 'name': 'peek', 'indexes': [], 'children': []},
             {'id': 20,
              'name': 'skipWhitespace:if_1 = 1#[1, -1]',
              'indexes': [],
              'children': []}]}]},
         {'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
         {'id': 22,
          'name': 'parseNegative:if_1 = 1#[-1]',
          'indexes': [],
          'children': [{'id': 23,
            'name': 'parseValue',
            'indexes': [],
            'children': [{'id': 24,
              'name': 'skipWhitespace',
              'indexes': [],
              'children': [{'id': 25,
                'name': 'hasNext',
                'indexes': [],
                'children': []},
               {'id': 26,
                'name': 'skipWhitespace:while_1 ? [1]',
                'indexes': [],
                'children': [{'id': 27,
                  'name': 'peek',
                  'indexes': [],
                  'children': []},
                 {'id': 28,
                  'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                  'indexes': [],
                  'children': []}]}]},
             {'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
             {'id': 30,
              'name': 'parseValue:if_1 = 0#[-1]',
              'indexes': [],
              'children': [{'id': 31,
                'name': 'parseNumber',
                'indexes': [],
                'children': [{'id': 32,
                  'name': 'skipWhitespace',
                  'indexes': [],
                  'children': [{'id': 33,
                    'name': 'hasNext',
                    'indexes': [],
                    'children': []},
                   {'id': 34,
                    'name': 'skipWhitespace:while_1 ? [1]',
                    'indexes': [],
                    'children': [{'id': 35,
                      'name': 'peek',
                      'indexes': [],
                      'children': []},
                     {'id': 36,
                      'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                      'indexes': [],
                      'children': []}]}]},
                 {'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
                 {'id': 38,
                  'name': 'parseNumber:while_1 ? [1]',
                  'indexes': [0],
                  'children': [{'id': 39,
                    'name': 'peek',
                    'indexes': [],
                    'children': []},
                   {'id': 40,
                    'name': 'parseNumber:if_1 = 1#[1, -1]',
                    'indexes': [],
                    'children': []}]},
                 {'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
                 {'id': 42,
                  'name': 'parseNumber:while_1 ? [2]',
                  'indexes': [1],
                  'children': [{'id': 43,
                    'name': 'peek',
                    'indexes': [],
                    'children': []},
                   {'id': 44,
                    'name': 'parseNumber:if_1 = 1#[2, -1]',
                    'indexes': [],
                    'children': []}]},
                 {'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
                 {'id': 46,
                  'name': 'parseNumber:while_1 ? [3]',
                  'indexes': [2],
                  'children': [{'id': 47,
                    'name': 'peek',
                    'indexes': [],
                    'children': []},
                   {'id': 48,
                    'name': 'parseNumber:if_1 = 1#[3, -1]',
                    'indexes': [],
                    'children': []}]},
                 {'id': 49,
                  'name': 'hasNext',
                  'indexes': [],
                  'children': []}]}]}]}]}]},
       16: {'id': 16,
        'name': 'skipWhitespace',
        'indexes': [],
        'children': [{'id': 17, 'name': 'hasNext', 'indexes': [], 'children': []},
         {'id': 18,
          'name': 'skipWhitespace:while_1 ? [1]',
          'indexes': [],
          'children': [{'id': 19, 'name': 'peek', 'indexes': [], 'children': []},
           {'id': 20,
            'name': 'skipWhitespace:if_1 = 1#[1, -1]',
            'indexes': [],
            'children': []}]}]},
       21: {'id': 21, 'name': 'peek', 'indexes': [], 'children': []},
       22: {'id': 22,
        'name': 'parseNegative:if_1 = 1#[-1]',
        'indexes': [],
        'children': [{'id': 23,
          'name': 'parseValue',
          'indexes': [],
          'children': [{'id': 24,
            'name': 'skipWhitespace',
            'indexes': [],
            'children': [{'id': 25,
              'name': 'hasNext',
              'indexes': [],
              'children': []},
             {'id': 26,
              'name': 'skipWhitespace:while_1 ? [1]',
              'indexes': [],
              'children': [{'id': 27, 'name': 'peek', 'indexes': [], 'children': []},
               {'id': 28,
                'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                'indexes': [],
                'children': []}]}]},
           {'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
           {'id': 30,
            'name': 'parseValue:if_1 = 0#[-1]',
            'indexes': [],
            'children': [{'id': 31,
              'name': 'parseNumber',
              'indexes': [],
              'children': [{'id': 32,
                'name': 'skipWhitespace',
                'indexes': [],
                'children': [{'id': 33,
                  'name': 'hasNext',
                  'indexes': [],
                  'children': []},
                 {'id': 34,
                  'name': 'skipWhitespace:while_1 ? [1]',
                  'indexes': [],
                  'children': [{'id': 35,
                    'name': 'peek',
                    'indexes': [],
                    'children': []},
                   {'id': 36,
                    'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                    'indexes': [],
                    'children': []}]}]},
               {'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
               {'id': 38,
                'name': 'parseNumber:while_1 ? [1]',
                'indexes': [0],
                'children': [{'id': 39,
                  'name': 'peek',
                  'indexes': [],
                  'children': []},
                 {'id': 40,
                  'name': 'parseNumber:if_1 = 1#[1, -1]',
                  'indexes': [],
                  'children': []}]},
               {'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
               {'id': 42,
                'name': 'parseNumber:while_1 ? [2]',
                'indexes': [1],
                'children': [{'id': 43,
                  'name': 'peek',
                  'indexes': [],
                  'children': []},
                 {'id': 44,
                  'name': 'parseNumber:if_1 = 1#[2, -1]',
                  'indexes': [],
                  'children': []}]},
               {'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
               {'id': 46,
                'name': 'parseNumber:while_1 ? [3]',
                'indexes': [2],
                'children': [{'id': 47,
                  'name': 'peek',
                  'indexes': [],
                  'children': []},
                 {'id': 48,
                  'name': 'parseNumber:if_1 = 1#[3, -1]',
                  'indexes': [],
                  'children': []}]},
               {'id': 49, 'name': 'hasNext', 'indexes': [], 'children': []}]}]}]}]},
       17: {'id': 17, 'name': 'hasNext', 'indexes': [], 'children': []},
       18: {'id': 18,
        'name': 'skipWhitespace:while_1 ? [1]',
        'indexes': [],
        'children': [{'id': 19, 'name': 'peek', 'indexes': [], 'children': []},
         {'id': 20,
          'name': 'skipWhitespace:if_1 = 1#[1, -1]',
          'indexes': [],
          'children': []}]},
       19: {'id': 19, 'name': 'peek', 'indexes': [], 'children': []},
       20: {'id': 20,
        'name': 'skipWhitespace:if_1 = 1#[1, -1]',
        'indexes': [],
        'children': []},
       23: {'id': 23,
        'name': 'parseValue',
        'indexes': [],
        'children': [{'id': 24,
          'name': 'skipWhitespace',
          'indexes': [],
          'children': [{'id': 25, 'name': 'hasNext', 'indexes': [], 'children': []},
           {'id': 26,
            'name': 'skipWhitespace:while_1 ? [1]',
            'indexes': [],
            'children': [{'id': 27, 'name': 'peek', 'indexes': [], 'children': []},
             {'id': 28,
              'name': 'skipWhitespace:if_1 = 1#[1, -1]',
              'indexes': [],
              'children': []}]}]},
         {'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
         {'id': 30,
          'name': 'parseValue:if_1 = 0#[-1]',
          'indexes': [],
          'children': [{'id': 31,
            'name': 'parseNumber',
            'indexes': [],
            'children': [{'id': 32,
              'name': 'skipWhitespace',
              'indexes': [],
              'children': [{'id': 33,
                'name': 'hasNext',
                'indexes': [],
                'children': []},
               {'id': 34,
                'name': 'skipWhitespace:while_1 ? [1]',
                'indexes': [],
                'children': [{'id': 35,
                  'name': 'peek',
                  'indexes': [],
                  'children': []},
                 {'id': 36,
                  'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                  'indexes': [],
                  'children': []}]}]},
             {'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
             {'id': 38,
              'name': 'parseNumber:while_1 ? [1]',
              'indexes': [0],
              'children': [{'id': 39, 'name': 'peek', 'indexes': [], 'children': []},
               {'id': 40,
                'name': 'parseNumber:if_1 = 1#[1, -1]',
                'indexes': [],
                'children': []}]},
             {'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
             {'id': 42,
              'name': 'parseNumber:while_1 ? [2]',
              'indexes': [1],
              'children': [{'id': 43, 'name': 'peek', 'indexes': [], 'children': []},
               {'id': 44,
                'name': 'parseNumber:if_1 = 1#[2, -1]',
                'indexes': [],
                'children': []}]},
             {'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
             {'id': 46,
              'name': 'parseNumber:while_1 ? [3]',
              'indexes': [2],
              'children': [{'id': 47, 'name': 'peek', 'indexes': [], 'children': []},
               {'id': 48,
                'name': 'parseNumber:if_1 = 1#[3, -1]',
                'indexes': [],
                'children': []}]},
             {'id': 49, 'name': 'hasNext', 'indexes': [], 'children': []}]}]}]},
       24: {'id': 24,
        'name': 'skipWhitespace',
        'indexes': [],
        'children': [{'id': 25, 'name': 'hasNext', 'indexes': [], 'children': []},
         {'id': 26,
          'name': 'skipWhitespace:while_1 ? [1]',
          'indexes': [],
          'children': [{'id': 27, 'name': 'peek', 'indexes': [], 'children': []},
           {'id': 28,
            'name': 'skipWhitespace:if_1 = 1#[1, -1]',
            'indexes': [],
            'children': []}]}]},
       29: {'id': 29, 'name': 'peek', 'indexes': [], 'children': []},
       30: {'id': 30,
        'name': 'parseValue:if_1 = 0#[-1]',
        'indexes': [],
        'children': [{'id': 31,
          'name': 'parseNumber',
          'indexes': [],
          'children': [{'id': 32,
            'name': 'skipWhitespace',
            'indexes': [],
            'children': [{'id': 33,
              'name': 'hasNext',
              'indexes': [],
              'children': []},
             {'id': 34,
              'name': 'skipWhitespace:while_1 ? [1]',
              'indexes': [],
              'children': [{'id': 35, 'name': 'peek', 'indexes': [], 'children': []},
               {'id': 36,
                'name': 'skipWhitespace:if_1 = 1#[1, -1]',
                'indexes': [],
                'children': []}]}]},
           {'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
           {'id': 38,
            'name': 'parseNumber:while_1 ? [1]',
            'indexes': [0],
            'children': [{'id': 39, 'name': 'peek', 'indexes': [], 'children': []},
             {'id': 40,
              'name': 'parseNumber:if_1 = 1#[1, -1]',
              'indexes': [],
              'children': []}]},
           {'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
           {'id': 42,
            'name': 'parseNumber:while_1 ? [2]',
            'indexes': [1],
            'children': [{'id': 43, 'name': 'peek', 'indexes': [], 'children': []},
             {'id': 44,
              'name': 'parseNumber:if_1 = 1#[2, -1]',
              'indexes': [],
              'children': []}]},
           {'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
           {'id': 46,
            'name': 'parseNumber:while_1 ? [3]',
            'indexes': [2],
            'children': [{'id': 47, 'name': 'peek', 'indexes': [], 'children': []},
             {'id': 48,
              'name': 'parseNumber:if_1 = 1#[3, -1]',
              'indexes': [],
              'children': []}]},
           {'id': 49, 'name': 'hasNext', 'indexes': [], 'children': []}]}]},
       25: {'id': 25, 'name': 'hasNext', 'indexes': [], 'children': []},
       26: {'id': 26,
        'name': 'skipWhitespace:while_1 ? [1]',
        'indexes': [],
        'children': [{'id': 27, 'name': 'peek', 'indexes': [], 'children': []},
         {'id': 28,
          'name': 'skipWhitespace:if_1 = 1#[1, -1]',
          'indexes': [],
          'children': []}]},
       27: {'id': 27, 'name': 'peek', 'indexes': [], 'children': []},
       28: {'id': 28,
        'name': 'skipWhitespace:if_1 = 1#[1, -1]',
        'indexes': [],
        'children': []},
       31: {'id': 31,
        'name': 'parseNumber',
        'indexes': [],
        'children': [{'id': 32,
          'name': 'skipWhitespace',
          'indexes': [],
          'children': [{'id': 33, 'name': 'hasNext', 'indexes': [], 'children': []},
           {'id': 34,
            'name': 'skipWhitespace:while_1 ? [1]',
            'indexes': [],
            'children': [{'id': 35, 'name': 'peek', 'indexes': [], 'children': []},
             {'id': 36,
              'name': 'skipWhitespace:if_1 = 1#[1, -1]',
              'indexes': [],
              'children': []}]}]},
         {'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
         {'id': 38,
          'name': 'parseNumber:while_1 ? [1]',
          'indexes': [0],
          'children': [{'id': 39, 'name': 'peek', 'indexes': [], 'children': []},
           {'id': 40,
            'name': 'parseNumber:if_1 = 1#[1, -1]',
            'indexes': [],
            'children': []}]},
         {'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
         {'id': 42,
          'name': 'parseNumber:while_1 ? [2]',
          'indexes': [1],
          'children': [{'id': 43, 'name': 'peek', 'indexes': [], 'children': []},
           {'id': 44,
            'name': 'parseNumber:if_1 = 1#[2, -1]',
            'indexes': [],
            'children': []}]},
         {'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
         {'id': 46,
          'name': 'parseNumber:while_1 ? [3]',
          'indexes': [2],
          'children': [{'id': 47, 'name': 'peek', 'indexes': [], 'children': []},
           {'id': 48,
            'name': 'parseNumber:if_1 = 1#[3, -1]',
            'indexes': [],
            'children': []}]},
         {'id': 49, 'name': 'hasNext', 'indexes': [], 'children': []}]},
       32: {'id': 32,
        'name': 'skipWhitespace',
        'indexes': [],
        'children': [{'id': 33, 'name': 'hasNext', 'indexes': [], 'children': []},
         {'id': 34,
          'name': 'skipWhitespace:while_1 ? [1]',
          'indexes': [],
          'children': [{'id': 35, 'name': 'peek', 'indexes': [], 'children': []},
           {'id': 36,
            'name': 'skipWhitespace:if_1 = 1#[1, -1]',
            'indexes': [],
            'children': []}]}]},
       37: {'id': 37, 'name': 'hasNext', 'indexes': [], 'children': []},
       38: {'id': 38,
        'name': 'parseNumber:while_1 ? [1]',
        'indexes': [0],
        'children': [{'id': 39, 'name': 'peek', 'indexes': [], 'children': []},
         {'id': 40,
          'name': 'parseNumber:if_1 = 1#[1, -1]',
          'indexes': [],
          'children': []}]},
       41: {'id': 41, 'name': 'hasNext', 'indexes': [], 'children': []},
       42: {'id': 42,
        'name': 'parseNumber:while_1 ? [2]',
        'indexes': [1],
        'children': [{'id': 43, 'name': 'peek', 'indexes': [], 'children': []},
         {'id': 44,
          'name': 'parseNumber:if_1 = 1#[2, -1]',
          'indexes': [],
          'children': []}]},
       45: {'id': 45, 'name': 'hasNext', 'indexes': [], 'children': []},
       46: {'id': 46,
        'name': 'parseNumber:while_1 ? [3]',
        'indexes': [2],
        'children': [{'id': 47, 'name': 'peek', 'indexes': [], 'children': []},
         {'id': 48,
          'name': 'parseNumber:if_1 = 1#[3, -1]',
          'indexes': [],
          'children': []}]},
       49: {'id': 49, 'name': 'hasNext', 'indexes': [], 'children': []},
       33: {'id': 33, 'name': 'hasNext', 'indexes': [], 'children': []},
       34: {'id': 34,
        'name': 'skipWhitespace:while_1 ? [1]',
        'indexes': [],
        'children': [{'id': 35, 'name': 'peek', 'indexes': [], 'children': []},
         {'id': 36,
          'name': 'skipWhitespace:if_1 = 1#[1, -1]',
          'indexes': [],
          'children': []}]},
       35: {'id': 35, 'name': 'peek', 'indexes': [], 'children': []},
       36: {'id': 36,
        'name': 'skipWhitespace:if_1 = 1#[1, -1]',
        'indexes': [],
        'children': []},
       39: {'id': 39, 'name': 'peek', 'indexes': [], 'children': []},
       40: {'id': 40,
        'name': 'parseNumber:if_1 = 1#[1, -1]',
        'indexes': [],
        'children': []},
       43: {'id': 43, 'name': 'peek', 'indexes': [], 'children': []},
       44: {'id': 44,
        'name': 'parseNumber:if_1 = 1#[2, -1]',
        'indexes': [],
        'children': []},
       47: {'id': 47, 'name': 'peek', 'indexes': [], 'children': []},
       48: {'id': 48,
        'name': 'parseNumber:if_1 = 1#[3, -1]',
        'indexes': [],
        'children': []},
       51: {'id': 51,
        'name': 'skipWhitespace',
        'indexes': [],
        'children': [{'id': 52, 'name': 'hasNext', 'indexes': [], 'children': []}]},
       53: {'id': 53, 'name': 'peek', 'indexes': [], 'children': []},
       54: {'id': 54,
        'name': 'parseMultiplication:if_1 = 2#[1, -1]',
        'indexes': [],
        'children': []},
       52: {'id': 52, 'name': 'hasNext', 'indexes': [], 'children': []},
       56: {'id': 56,
        'name': 'skipWhitespace',
        'indexes': [],
        'children': [{'id': 57, 'name': 'hasNext', 'indexes': [], 'children': []}]},
       58: {'id': 58, 'name': 'peek', 'indexes': [], 'children': []},
       59: {'id': 59,
        'name': 'parseAddition:if_1 = 2#[1, -1]',
        'indexes': [],
        'children': []},
       57: {'id': 57, 'name': 'hasNext', 'indexes': [], 'children': []},
       61: {'id': 61, 'name': 'hasNext', 'indexes': [], 'children': []}}
      
      . . .
      In [95]:
      xxxxxxxxxx
      
      8
       
      1
      def wrap_input(istr):
      
      2
          def extract_node(node, id):
      
      3
              symbol = str(node['id'])
      
      4
              children = node['children']
      
      5
              annotation = str(node['name'])
      
      6
              indexes = repr(tuple([istr[i] for i in node['indexes']]))
      
      7
              return "%s %s" % (annotation, indexes), children, ''
      
      8
          return extract_node
      
      executed in 5ms, finished 04:51:44 2019-08-15
      . . .
      In [96]:
      xxxxxxxxxx
      
      1
       
      1
      %top extract_node1 = wrap_input(calc_trace[0]['inputstr'])
      
      executed in 4ms, finished 04:51:44 2019-08-15
      . . .
      In [97]:
      xxxxxxxxxx
      
      1
       
      1
      %top zoom(display_tree(calc_method_tree1[0], extract_node=extract_node1))
      
      executed in 104ms, finished 04:51:44 2019-08-15
      Out[97]:
      . . .
      In [98]:
      xxxxxxxxxx
      
      1
       
      1
      %top extract_node1 = wrap_input(mathexpr_trace[0]['inputstr'])
      
      executed in 7ms, finished 04:51:45 2019-08-15
      . . .
      In [99]:
      xxxxxxxxxx
      
      1
       
      1
      %top zoom(display_tree(mathexpr_method_tree1[0], extract_node=extract_node1))
      
      executed in 108ms, finished 04:51:45 2019-08-15
      Out[99]:
      . . .
      1
       
      1
      We define `to_node()` a convenience function that, given a list of _contiguous_ indexes and original string, translates it to a leaf node of a tree (that corresponds to the derivation tree syntax in the Fuzzingbook) with a string, empty children, and starting node and ending node.
      

      We define to_node() a convenience function that, given a list of contiguous indexes and original string, translates it to a leaf node of a tree (that corresponds to the derivation tree syntax in the Fuzzingbook) with a string, empty children, and starting node and ending node.

      1
       
      1
      Convert a list of indexes to a corresponding terminal tree node
      

      Convert a list of indexes to a corresponding terminal tree node

      In [100]:
      xxxxxxxxxx
      
      5
       
      1
      def to_node(idxes, my_str):
      
      2
          assert len(idxes) == idxes[-1] - idxes[0] + 1
      
      3
          assert min(idxes) == idxes[0]
      
      4
          assert max(idxes) == idxes[-1]
      
      5
          return my_str[idxes[0]:idxes[-1] + 1], [], idxes[0], idxes[-1]
      
      executed in 10ms, finished 04:51:45 2019-08-15
      . . .
      1
       
      1
      Here is how one would use it.
      

      Here is how one would use it.

      In [101]:
      xxxxxxxxxx
      
      1
       
      1
      %top to_node(calc_method_tree1[6]['indexes'], calc_trace[0]['inputstr'])
      
      executed in 12ms, finished 04:51:45 2019-08-15
      Out[101]:
      ('9', [], 0, 0)
      
      . . .
      In [102]:
      xxxxxxxxxx
      
      2
       
      1
      from operator import itemgetter
      
      2
      import itertools as it
      
      executed in 6ms, finished 04:51:45 2019-08-15
      . . .
      1
       
      1
      We now need to identify the terminal (leaf) nodes. For that, we want to group contiguous letters in a node together, and call it a leaf node. So, convert our list of indexes to lists of contiguous indexes first, then convert them to terminal tree nodes. Then, return a set of one level child nodes with contiguous chars from indexes.
      

      We now need to identify the terminal (leaf) nodes. For that, we want to group contiguous letters in a node together, and call it a leaf node. So, convert our list of indexes to lists of contiguous indexes first, then convert them to terminal tree nodes. Then, return a set of one level child nodes with contiguous chars from indexes.

      In [103]:
      xxxxxxxxxx
      
      7
       
      1
      def indexes_to_children(indexes, my_str):
      
      2
          lst = [
      
      3
              list(map(itemgetter(1), g))
      
      4
              for k, g in it.groupby(enumerate(indexes), lambda x: x[0] - x[1])
      
      5
          ]
      
      6
      ​
      
      7
          return [to_node(n, my_str) for n in lst]
      
      executed in 7ms, finished 04:51:45 2019-08-15
      . . .
      In [104]:
      xxxxxxxxxx
      
      1
       
      1
      %top indexes_to_children(calc_method_tree1[6]['indexes'], calc_trace[0]['inputstr'])
      
      executed in 11ms, finished 04:51:45 2019-08-15
      Out[104]:
      [('9', [], 0, 0)]
      
      . . .
      1
       
      1
      Finally, we need to remove the overlap from the trees we have so far. The idea is that, given a node, each child node of that node should be uniquely responsible for a specified range of characters, with no overlap allowed between the children. The starting of the first child to ending of the last child will be the range of the node.
      

      Finally, we need to remove the overlap from the trees we have so far. The idea is that, given a node, each child node of that node should be uniquely responsible for a specified range of characters, with no overlap allowed between the children. The starting of the first child to ending of the last child will be the range of the node.

      2
       
      1
      #### Removing Overlap
      
      2
      If overlap is found, the tie is biased to the later child. That is, the later child gets to keep the range, and the former child is recursively traversed to remove overlaps from its children. If a child is completely included in the overlap, the child is excised. A few convenience functions first:
      

      1.6.1.3  Removing Overlap¶

      If overlap is found, the tie is biased to the later child. That is, the later child gets to keep the range, and the former child is recursively traversed to remove overlaps from its children. If a child is completely included in the overlap, the child is excised. A few convenience functions first:

      In [105]:
      xxxxxxxxxx
      
      2
       
      1
      def does_item_overlap(s, e, s_, e_):
      
      2
          return (s_ >= s and s_ <= e) or (e_ >= s and e_ <= e) or (s_ <= s and e_ >= e)
      
      executed in 7ms, finished 04:51:45 2019-08-15
      . . .
      In [106]:
      xxxxxxxxxx
      
      2
       
      1
      def is_second_item_included(s, e, s_, e_):
      
      2
          return (s_ >= s and e_ <= e)
      
      executed in 6ms, finished 04:51:45 2019-08-15
      . . .
      In [107]:
      xxxxxxxxxx
      
      2
       
      1
      def has_overlap(ranges, s_, e_):
      
      2
          return {(s, e) for (s, e) in ranges if does_item_overlap(s, e, s_, e_)}
      
      executed in 6ms, finished 04:51:45 2019-08-15
      . . .
      In [108]:
      xxxxxxxxxx
      
      2
       
      1
      def is_included(ranges, s_, e_):
      
      2
          return {(s, e) for (s, e) in ranges if is_second_item_included(s, e, s_, e_)}
      
      executed in 7ms, finished 04:51:45 2019-08-15
      . . .
      In [109]:
      xxxxxxxxxx
      
      23
       
      1
      def remove_overlap_from(original_node, orange):
      
      2
          node, children, start, end = original_node
      
      3
          new_children = []
      
      4
          if not children:
      
      5
              return None
      
      6
          start = -1
      
      7
          end = -1
      
      8
          for child in children:
      
      9
              if does_item_overlap(*child[2:4], *orange):
      
      10
                  new_child = remove_overlap_from(child, orange)
      
      11
                  if new_child: # and new_child[1]:
      
      12
                      if start == -1: start = new_child[2]
      
      13
                      new_children.append(new_child)
      
      14
                      end = new_child[3]
      
      15
              else:
      
      16
                  new_children.append(child)
      
      17
                  if start == -1: start = child[2]
      
      18
                  end = child[3]
      
      19
          if not new_children:
      
      20
              return None
      
      21
          assert start != -1
      
      22
          assert end != -1
      
      23
          return (node, new_children, start, end)
      
      executed in 10ms, finished 04:51:45 2019-08-15
      . . .
      1
       
      1
      Verify that there is no overlap.
      

      Verify that there is no overlap.

      In [110]:
      xxxxxxxxxx
      
      28
       
      1
      def no_overlap(arr):
      
      2
          my_ranges = {}
      
      3
          for a in arr:
      
      4
              _, _, s, e = a
      
      5
              included = is_included(my_ranges, s, e)
      
      6
              if included:
      
      7
                  continue  # we will fill up the blanks later.
      
      8
              else:
      
      9
                  overlaps = has_overlap(my_ranges, s, e)
      
      10
                  if overlaps:
      
      11
                      # unlike include which can happen only once in a set of
      
      12
                      # non-overlapping ranges, overlaps can happen on multiple parts.
      
      13
                      # The rule is, the later child gets the say. So, we recursively
      
      14
                      # remove any ranges that overlap with the current one from the
      
      15
                      # overlapped range.
      
      16
                      assert len(overlaps) == 1
      
      17
                      oitem = list(overlaps)[0]
      
      18
                      v = remove_overlap_from(my_ranges[oitem], (s,e))
      
      19
                      del my_ranges[oitem]
      
      20
                      if v:
      
      21
                          my_ranges[v[2:4]] = v
      
      22
                      my_ranges[(s, e)] = a
      
      23
                  else:
      
      24
                      my_ranges[(s, e)] = a
      
      25
          res = my_ranges.values()
      
      26
          # assert no overlap, and order by starting index
      
      27
          s = sorted(res, key=lambda x: x[2])
      
      28
          return s
      
      executed in 9ms, finished 04:51:45 2019-08-15
      . . .
      3
       
      1
      #### Generate derivation tree
      
      2
      ​
      
      3
      Convert a mapped tree to the _fuzzingbook_ style derivation tree.
      

      1.6.1.4  Generate derivation tree¶

      Convert a mapped tree to the fuzzingbook style derivation tree.

      In [111]:
      xxxxxxxxxx
      
      23
       
      1
      def to_tree(node, my_str):
      
      2
          method_name = ("<%s>" % node['name']) if node['name'] is not None else '<START>'
      
      3
          indexes = node['indexes']
      
      4
          node_children = [to_tree(c, my_str) for c in node.get('children', [])]
      
      5
          idx_children = indexes_to_children(indexes, my_str)
      
      6
          children = no_overlap([c for c in node_children if c is not None] + idx_children)
      
      7
          if not children:
      
      8
              return None
      
      9
          start_idx = children[0][2]
      
      10
          end_idx = children[-1][3]
      
      11
          si = start_idx
      
      12
          my_children = []
      
      13
          # FILL IN chars that we did not compare. This is likely due to an i + n
      
      14
          # instruction.
      
      15
          for c in children:
      
      16
              if c[2] != si:
      
      17
                  sbs = my_str[si: c[2]]
      
      18
                  my_children.append((sbs, [], si, c[2] - 1))
      
      19
              my_children.append(c)
      
      20
              si = c[3] + 1
      
      21
      ​
      
      22
          m = (method_name, my_children, start_idx, end_idx)
      
      23
          return m
      
      executed in 13ms, finished 04:51:45 2019-08-15
      . . .
      In [112]:
      xxxxxxxxxx
      
      1
       
      1
      %top zoom(display_tree(to_tree(calc_method_tree1[0], calc_trace[0]['inputstr'])))
      
      executed in 102ms, finished 04:51:45 2019-08-15
      Out[112]:
      . . .
      In [113]:
      xxxxxxxxxx
      
      1
       
      1
      %top zoom(display_tree(to_tree(mathexpr_method_tree1[0], mathexpr_trace[0]['inputstr'])))
      
      executed in 95ms, finished 04:51:45 2019-08-15
      Out[113]:
      . . .
      3
       
      1
      ### The Complete Miner
      
      2
      ​
      
      3
      We now put everything together. The `miner()` takes the traces, produces trees out of them, and verifies that the trees actually correspond to the input.
      

      1.6.2  The Complete Miner¶

      We now put everything together. The miner() takes the traces, produces trees out of them, and verifies that the trees actually correspond to the input.

      In [114]:
      xxxxxxxxxx
      
      1
       
      1
      from fuzzingbook.GrammarFuzzer import tree_to_string
      
      executed in 6ms, finished 04:51:45 2019-08-15
      . . .
      In [115]:
      xxxxxxxxxx
      
      18
       
      1
      def miner(call_traces):
      
      2
          my_trees = []
      
      3
          for call_trace in call_traces:
      
      4
              method_map = call_trace['method_map']
      
      5
      ​
      
      6
              first, method_tree = reconstruct_method_tree(method_map)
      
      7
              comparisons = call_trace['comparisons']
      
      8
              attach_comparisons(method_tree, last_comparisons(comparisons))
      
      9
      ​
      
      10
              my_str = call_trace['inputstr']
      
      11
      ​
      
      12
              #print("INPUT:", my_str, file=sys.stderr)
      
      13
              tree = to_tree(method_tree[first], my_str)
      
      14
              #print("RECONSTRUCTED INPUT:", tree_to_string(tree), file=sys.stderr)
      
      15
              my_tree = {'tree': tree, 'original': call_trace['original'], 'arg': call_trace['arg']}
      
      16
              assert tree_to_string(tree) == my_str
      
      17
              my_trees.append(my_tree)
      
      18
          return my_trees
      
      executed in 11ms, finished 04:51:45 2019-08-15
      . . .
      1
       
      1
      Using the `miner()`
      

      Using the miner()

      In [116]:
      xxxxxxxxxx
      
      3
       
      1
      %top mined_calc_trees = miner(calc_trace)
      
      2
      %top calc_tree = mined_calc_trees[0]
      
      3
      %top zoom(display_tree(calc_tree['tree']))
      
      executed in 91ms, finished 04:51:45 2019-08-15
      Out[116]:
      . . .
      In [117]:
      xxxxxxxxxx
      
      3
       
      1
      %top mined_mathexpr_trees = miner(mathexpr_trace)
      
      2
      %top mathexpr_tree = mined_mathexpr_trees[1]
      
      3
      %top zoom(display_tree(mathexpr_tree['tree']))
      
      executed in 92ms, finished 04:51:45 2019-08-15
      Out[117]:
      . . .
      1
       
      1
      ## Generalize Iterations
      

      1.7  Generalize Iterations¶

      17
       
      1
      One of the problems that you can notice in the tree generated is that each `while` iterations get a different identifier. e.g. 
      
      2
      ```
      
      3
                      ('<parse_expr:while_1 ? [2]>', [('+', [], 5, 5)], 5, 5),
      
      4
                      ('<parse_expr:while_1 ? [3]>',
      
      5
                       [('<parse_expr:if_1 + 0#[3, -1]>',
      
      6
                         [('<parse_num>',
      
      7
                           [('<is_digit>', [('7', [], 6, 6)], 6, 6),
      
      8
                            ('<is_digit>', [('2', [], 7, 7)], 7, 7)],
      
      9
                           6,
      
      10
                           7)],
      
      11
                         6,
      
      12
                         7)],
      
      13
      ​
      
      14
      ```
      
      15
      The separate identifiers are intentional because we do not yet know the actual dependencies between different iterations such as closing quotes, or closing braces or parenthesis. However, this creates a problem when we mine grammar because we need to match up the compatible nodes.
      
      16
      ​
      
      17
      Generalizer does it through actively doing surgery on the tree to see whether a node is replaceable with another.
      

      One of the problems that you can notice in the tree generated is that each while iterations get a different identifier. e.g.

                      ('<parse_expr:while_1 ? [2]>', [('+', [], 5, 5)], 5, 5),
                      ('<parse_expr:while_1 ? [3]>',
                       [('<parse_expr:if_1 + 0#[3, -1]>',
                         [('<parse_num>',
                           [('<is_digit>', [('7', [], 6, 6)], 6, 6),
                            ('<is_digit>', [('2', [], 7, 7)], 7, 7)],
                           6,
                           7)],
                         6,
                         7)],
      
      

      The separate identifiers are intentional because we do not yet know the actual dependencies between different iterations such as closing quotes, or closing braces or parenthesis. However, this creates a problem when we mine grammar because we need to match up the compatible nodes.

      Generalizer does it through actively doing surgery on the tree to see whether a node is replaceable with another.

      In [118]:
      xxxxxxxxxx
      
      2
       
      1
      import copy
      
      2
      import random
      
      executed in 6ms, finished 04:51:45 2019-08-15
      . . .
      3
       
      1
      ### Checking compatibility of nodes
      
      2
      ​
      
      3
      We first need a few helper functions. The `replace_nodes()` function try to replace the _contents_ of the first node with the _contents_ of the second (That is, the tree that has these nodes will automatically be modified), collect the produced string from the tree, and reset any changes. The arguments are tuples with the following format: (node, file_name, tree)
      

      1.7.1  Checking compatibility of nodes¶

      We first need a few helper functions. The replace_nodes() function try to replace the contents of the first node with the contents of the second (That is, the tree that has these nodes will automatically be modified), collect the produced string from the tree, and reset any changes. The arguments are tuples with the following format: (node, file_name, tree)

      In [119]:
      xxxxxxxxxx
      
      16
       
      1
      def replace_nodes(a2, a1):
      
      2
          node2, _, t2 = a2
      
      3
          node1, _, t1 = a1
      
      4
          str2_old = tree_to_string(t2)
      
      5
          old = copy.copy(node2)
      
      6
          node2.clear()
      
      7
          for n in node1:
      
      8
              node2.append(n)
      
      9
          str2_new = tree_to_string(t2)
      
      10
          assert str2_old != str2_new
      
      11
          node2.clear()
      
      12
          for n in old:
      
      13
              node2.append(n)
      
      14
          str2_last = tree_to_string(t2)
      
      15
          assert str2_old == str2_last
      
      16
          return str2_new
      
      executed in 7ms, finished 04:51:45 2019-08-15
      . . .
      1
       
      1
      Can a given node be replaced with another? The idea is, given two nodes (possibly from two trees), can the first node be replaced by the second, and still result in a valid string?
      

      Can a given node be replaced with another? The idea is, given two nodes (possibly from two trees), can the first node be replaced by the second, and still result in a valid string?

      In [120]:
      xxxxxxxxxx
      
      5
       
      1
      def is_compatible(a1, a2, module):
      
      2
          if tree_to_string(a1[0]) == tree_to_string(a2[0]):
      
      3
              return True
      
      4
          my_string = replace_nodes(a1, a2)
      
      5
          return check(my_string, module)
      
      executed in 7ms, finished 04:51:45 2019-08-15
      . . .
      In [121]:
      xxxxxxxxxx
      
      26
       
      1
      %%var check_src
      
      2
      # [(
      
      3
      import sys, imp
      
      4
      parse_ = imp.new_module('parse_')
      
      5
      ​
      
      6
      def init_module(src):
      
      7
          with open(src) as sf:
      
      8
              exec(sf.read(), parse_.__dict__)
      
      9
      ​
      
      10
      def _check(s):
      
      11
          try:
      
      12
              parse_.main(s)
      
      13
              return True
      
      14
          except:
      
      15
              return False
      
      16
      ​
      
      17
      import sys
      
      18
      def main(args):
      
      19
          init_module(args[0])
      
      20
          if _check(args[1]):
      
      21
              sys.exit(0)
      
      22
          else:
      
      23
              sys.exit(1)
      
      24
      import sys
      
      25
      main(sys.argv[1:])
      
      26
      # )]
      
      executed in 5ms, finished 04:51:45 2019-08-15
      . . .
      In [122]:
      xxxxxxxxxx
      
      4
       
      1
      # [(
      
      2
      with open('build/check.py', 'w+') as f:
      
      3
          print(VARS['check_src'], file=f)
      
      4
      # )]
      
      executed in 5ms, finished 04:51:45 2019-08-15
      . . .
      In [123]:
      xxxxxxxxxx
      
      4
       
      1
      EXEC_MAP = {}
      
      2
      NODE_REGISTER = {}
      
      3
      TREE = None
      
      4
      FILE = None
      
      executed in 7ms, finished 04:51:45 2019-08-15
      . . .
      In [124]:
      xxxxxxxxxx
      
      6
       
      1
      def reset_generalizer():
      
      2
          global NODE_REGISTER, TREE, FILE, EXEC_MAP
      
      3
          NODE_REGISTER={}
      
      4
          TREE = None
      
      5
          FILE = None
      
      6
          EXEC_MAP = {}
      
      executed in 5ms, finished 04:51:45 2019-08-15
      . . .
      In [125]:
      xxxxxxxxxx
      
      1
       
      1
      reset_generalizer()
      
      executed in 6ms, finished 04:51:45 2019-08-15
      . . .
      In [126]:
      xxxxxxxxxx
      
      1
       
      1
      import os.path
      
      executed in 4ms, finished 04:51:45 2019-08-15
      . . .
      In [127]:
      xxxxxxxxxx
      
      11
       
      1
      def check(s, module):
      
      2
          if s in EXEC_MAP: return EXEC_MAP[s]
      
      3
          result = do(["python", "./build/check.py", "subjects/%s" % module, s], shell=False)
      
      4
          with open('build/%s.log' % module, 'a+') as f:
      
      5
              print(s, file=f)
      
      6
              print(' '.join(["python", "./build/check.py", "subjects/%s" % module, s]), file=f)
      
      7
              print(":=", result.returncode, file=f)
      
      8
              print("\n", file=f)
      
      9
          v = (result.returncode == 0)
      
      10
          EXEC_MAP[s] = v
      
      11
          return v
      
      executed in 8ms, finished 04:51:45 2019-08-15
      . . .
      1
       
      1
      #### Using it
      

      1.7.1.1  Using it¶

      In [128]:
      xxxxxxxxxx
      
      3
       
      1
      def to_modifiable(derivation_tree):
      
      2
          node, children, *rest = derivation_tree
      
      3
          return [node, [to_modifiable(c) for c in children], *rest]
      
      executed in 6ms, finished 04:51:45 2019-08-15
      . . .
      In [129]:
      xxxxxxxxxx
      
      2
       
      1
      %top calc_tree_ = to_modifiable(calc_tree['tree'])
      
      2
      %top while_loops = calc_tree_[1][0][1][0][1]
      
      executed in 7ms, finished 04:51:45 2019-08-15
      . . .
      In [130]:
      xxxxxxxxxx
      
      1
       
      1
      %top while_loops[0]
      
      executed in 11ms, finished 04:51:45 2019-08-15
      Out[130]:
      ['<parse_expr:while_1 ? [1]>',
       [['<parse_expr:if_1 = 0#[1, -1]>',
         [['<parse_num>', [['<is_digit>', [['9', [], 0, 0]], 0, 0]], 0, 0]],
         0,
         0]],
       0,
       0]
      
      . . .
      In [131]:
      xxxxxxxxxx
      
      1
       
      1
      %top while_loops[1]
      
      executed in 9ms, finished 04:51:45 2019-08-15
      Out[131]:
      ['<parse_expr:while_1 ? [2]>', [['-', [], 1, 1]], 1, 1]
      
      . . .
      In [132]:
      xxxxxxxxxx
      
      1
       
      1
      %top assert not is_compatible((while_loops[1], 'c.py', calc_tree_), (while_loops[0], 'c.py', calc_tree_), 'calculator.py')
      
      executed in 49ms, finished 04:51:45 2019-08-15
      . . .
      In [133]:
      xxxxxxxxxx
      
      1
       
      1
      %top assert is_compatible((while_loops[0], 'c.py', calc_tree_), (while_loops[2], 'c.py', calc_tree_), 'calculator.py')
      
      executed in 57ms, finished 04:51:45 2019-08-15
      . . .
      1
       
      1
      We need to extract meta information from the names, and update it back. TODO: make the meta info JSON.
      

      We need to extract meta information from the names, and update it back. TODO: make the meta info JSON.

      In [134]:
      xxxxxxxxxx
      
      14
       
      1
      def parse_name(name):
      
      2
          assert name[0] + name[-1] == '<>'
      
      3
          name = name[1:-1]
      
      4
          method, rest = name.split(':')
      
      5
          ctrl_name, space, rest = rest.partition(' ')
      
      6
          can_empty, space, stack = rest.partition(' ')
      
      7
          ctrl, cname = ctrl_name.split('_')
      
      8
          if ':while_' in name:
      
      9
              method_stack = json.loads(stack)
      
      10
              return method, ctrl, int(cname), 0, can_empty, method_stack
      
      11
          elif ':if_' in name:
      
      12
              num, mstack = stack.split('#')
      
      13
              method_stack = json.loads(mstack)
      
      14
              return method, ctrl, int(cname), num, can_empty, method_stack
      
      executed in 15ms, finished 04:51:45 2019-08-15
      . . .
      In [135]:
      xxxxxxxxxx
      
      1
       
      1
      %top [parse_name(w[0]) for w in while_loops]
      
      executed in 12ms, finished 04:51:45 2019-08-15
      Out[135]:
      [('parse_expr', 'while', 1, 0, '?', [1]),
       ('parse_expr', 'while', 1, 0, '?', [2]),
       ('parse_expr', 'while', 1, 0, '?', [3]),
       ('parse_expr', 'while', 1, 0, '?', [4]),
       ('parse_expr', 'while', 1, 0, '?', [5]),
       ('parse_expr', 'while', 1, 0, '?', [6]),
       ('parse_expr', 'while', 1, 0, '?', [7])]
      
      . . .
      In [136]:
      xxxxxxxxxx
      
      5
       
      1
      def unparse_name(method, ctrl, name, num, can_empty, cstack):
      
      2
          if ctrl == 'while':
      
      3
              return "<%s:%s_%s %s %s>" % (method, ctrl, name, can_empty, json.dumps(cstack))
      
      4
          else:
      
      5
              return "<%s:%s_%s %s %s#%s>" % (method, ctrl, name, can_empty, num, json.dumps(cstack))
      
      executed in 8ms, finished 04:51:45 2019-08-15
      . . .
      1
       
      1
      Verify that parsing and unparsing works.
      

      Verify that parsing and unparsing works.

      In [137]:
      xxxxxxxxxx
      
      1
       
      1
      %top assert all(unparse_name(*parse_name(w[0])) == w[0] for w in while_loops)
      
      executed in 8ms, finished 04:51:45 2019-08-15
      . . .
      3
       
      1
      ### Propagate rename of the `while` node up the child nodes.
      
      2
      ​
      
      3
      The `update_stack()` when given a node, and a new name, recursively updates the method stack in the children.
      

      1.7.2  Propagate rename of the while node up the child nodes.¶

      The update_stack() when given a node, and a new name, recursively updates the method stack in the children.

      In [138]:
      xxxxxxxxxx
      
      11
       
      1
      def update_stack(node, at, new_name):
      
      2
          nname, children, *rest = node
      
      3
          if not (':if_' in nname or ':while_' in nname):
      
      4
              return
      
      5
          method, ctrl, cname, num, can_empty, cstack = parse_name(nname)
      
      6
          cstack[at] = new_name
      
      7
          name = unparse_name(method, ctrl, cname, num, can_empty, cstack)
      
      8
          #assert '?' not in name
      
      9
          node[0] = name
      
      10
          for c in children:
      
      11
              update_stack(c, at, new_name)
      
      executed in 11ms, finished 04:51:46 2019-08-15
      . . .
      1
       
      1
      Update the node name once we have identified that it corresponds to a global name.
      

      Update the node name once we have identified that it corresponds to a global name.

      In [139]:
      xxxxxxxxxx
      
      16
       
      1
      def update_name(k_m, my_id, seen):
      
      2
          # fixup k_m with what is in my_id, and update seen.
      
      3
          original = k_m[0]
      
      4
          method, ctrl, cname, num, can_empty, cstack = parse_name(original)
      
      5
          #assert can_empty != '?'
      
      6
          cstack[-1] = float('%d.0' % my_id)
      
      7
          name = unparse_name(method, ctrl, cname, num, can_empty, cstack)
      
      8
          seen[k_m[0]] = name
      
      9
          k_m[0] = name
      
      10
      ​
      
      11
          # only replace it at the len(cstack) -1 the
      
      12
          # until the first non-cf token
      
      13
          children = []
      
      14
          for c in k_m[1]:
      
      15
              update_stack(c, len(cstack)-1, cstack[-1])
      
      16
          return name, k_m
      
      executed in 10ms, finished 04:51:46 2019-08-15
      . . .
      1
       
      1
      Note that the rename happens only within the current method stack. That is, it does not propagate across method calls. Here is how one would use it.
      

      Note that the rename happens only within the current method stack. That is, it does not propagate across method calls. Here is how one would use it.

      In [140]:
      xxxxxxxxxx
      
      1
       
      1
      %top while_loops[2]
      
      executed in 12ms, finished 04:51:46 2019-08-15
      Out[140]:
      ['<parse_expr:while_1 ? [3]>',
       [['<parse_expr:if_1 = 2#[3, -1]>',
         [['<parse_paren>',
           [['(', [], 2, 2],
            ['<parse_expr>',
             [['<parse_expr:while_1 ? [1]>',
               [['<parse_expr:if_1 = 0#[1, -1]>',
                 [['<parse_num>',
                   [['<is_digit>', [['1', [], 3, 3]], 3, 3],
                    ['<is_digit>', [['6', [], 4, 4]], 4, 4]],
                   3,
                   4]],
                 3,
                 4]],
               3,
               4],
              ['<parse_expr:while_1 ? [2]>', [['+', [], 5, 5]], 5, 5],
              ['<parse_expr:while_1 ? [3]>',
               [['<parse_expr:if_1 = 0#[3, -1]>',
                 [['<parse_num>',
                   [['<is_digit>', [['7', [], 6, 6]], 6, 6],
                    ['<is_digit>', [['2', [], 7, 7]], 7, 7]],
                   6,
                   7]],
                 6,
                 7]],
               6,
               7]],
             3,
             7],
            [')', [], 8, 8]],
           2,
           8]],
         2,
         8]],
       2,
       8]
      
      . . .
      1
       
      1
      We update the iteration number `3` with a global id `4.0`
      

      We update the iteration number 3 with a global id 4.0

      In [141]:
      xxxxxxxxxx
      
      2
       
      1
      %top name, node = update_name(while_loops[2], 4, {})
      
      2
      %top node
      
      executed in 12ms, finished 04:51:46 2019-08-15
      Out[141]:
      ['<parse_expr:while_1 ? [4.0]>',
       [['<parse_expr:if_1 = 2#[4.0, -1]>',
         [['<parse_paren>',
           [['(', [], 2, 2],
            ['<parse_expr>',
             [['<parse_expr:while_1 ? [1]>',
               [['<parse_expr:if_1 = 0#[1, -1]>',
                 [['<parse_num>',
                   [['<is_digit>', [['1', [], 3, 3]], 3, 3],
                    ['<is_digit>', [['6', [], 4, 4]], 4, 4]],
                   3,
                   4]],
                 3,
                 4]],
               3,
               4],
              ['<parse_expr:while_1 ? [2]>', [['+', [], 5, 5]], 5, 5],
              ['<parse_expr:while_1 ? [3]>',
               [['<parse_expr:if_1 = 0#[3, -1]>',
                 [['<parse_num>',
                   [['<is_digit>', [['7', [], 6, 6]], 6, 6],
                    ['<is_digit>', [['2', [], 7, 7]], 7, 7]],
                   6,
                   7]],
                 6,
                 7]],
               6,
               7]],
             3,
             7],
            [')', [], 8, 8]],
           2,
           8]],
         2,
         8]],
       2,
       8]
      
      . . .
      3
       
      1
      ##### replace  a set of nodes
      
      2
      ​
      
      3
      We want to replace the `while` loop iteration identifiers with a global identifier. For that, we are given a list of nodes that are compatible with global ones. We first extract the iteration id from the global node, and apply it on the `while` node under consideration.
      
      replace a set of nodes¶

      We want to replace the while loop iteration identifiers with a global identifier. For that, we are given a list of nodes that are compatible with global ones. We first extract the iteration id from the global node, and apply it on the while node under consideration.

      In [142]:
      xxxxxxxxxx
      
      16
       
      1
      def replace_stack_and_mark_star(to_replace):
      
      2
          # remember, we only replace whiles.
      
      3
          for (i, j) in to_replace:
      
      4
              method1, ctrl1, cname1, num1, can_empty1, cstack1 = parse_name(i[0])
      
      5
              method2, ctrl2, cname2, num2, can_empty2, cstack2 = parse_name(j[0])
      
      6
              assert method1 == method2
      
      7
              assert ctrl1 == ctrl2
      
      8
              assert cname1 == cname2
      
      9
              #assert can_empty2 != '?'
      
      10
      ​
      
      11
              # fixup the can_empty
      
      12
              new_name = unparse_name(method1, ctrl1, cname1, num1, can_empty2, cstack1)
      
      13
              i[0] = new_name
      
      14
              assert len(cstack1) == len(cstack2)
      
      15
              update_stack(i, len(cstack2)-1, cstack2[-1])
      
      16
          to_replace.clear()
      
      executed in 11ms, finished 04:51:46 2019-08-15
      . . .
      2
       
      1
      ### Generalize a given set of loops
      
      2
      The main workhorse. It generalizes the looping constructs. It is given a set of while loops with the same label under the current node. TODO: Refactor when we actually have time.
      

      1.7.3  Generalize a given set of loops¶

      The main workhorse. It generalizes the looping constructs. It is given a set of while loops with the same label under the current node. TODO: Refactor when we actually have time.

      2
       
      1
      ##### Helper: node inclusion
      
      2
      Checking for node inclusion. We do not want to try including a first node in second if the first node already contains the second. It will lead to infinite loop on `tree_to_string()`.
      
      Helper: node inclusion¶

      Checking for node inclusion. We do not want to try including a first node in second if the first node already contains the second. It will lead to infinite loop on tree_to_string().

      In [143]:
      xxxxxxxxxx
      
      4
       
      1
      def node_include(i, j):
      
      2
          name_i, children_i, s_i, e_i = i
      
      3
          name_j, children_j, s_j, e_j = j
      
      4
          return s_i <= s_j and e_i >= e_j
      
      executed in 4ms, finished 04:51:46 2019-08-15
      . . .
      2
       
      1
      ##### Helper: sorting
      
      2
      Ordering nodes by their highest complexity to avoid spurious can-replace answers.
      
      Helper: sorting¶

      Ordering nodes by their highest complexity to avoid spurious can-replace answers.

      In [144]:
      xxxxxxxxxx
      
      8
       
      1
      def num_tokens(v, s):
      
      2
          name, child, *rest = v
      
      3
          s.add(name)
      
      4
          [num_tokens(i, s) for i in child]
      
      5
          return len(s)
      
      6
      ​
      
      7
      def s_fn(v):
      
      8
          return num_tokens(v[0], set())
      
      executed in 6ms, finished 04:51:46 2019-08-15
      . . .
      In [145]:
      xxxxxxxxxx
      
      1
       
      1
      MAX_SAMPLES = 1 # with reasonably complex inputs, this is sufficent if we do the surgery both ways.
      
      executed in 4ms, finished 04:51:46 2019-08-15
      . . .
      1
       
      1
      First, we check whether any of the loops we have are compatible with the globally registered loops in `while_register`.
      

      First, we check whether any of the loops we have are compatible with the globally registered loops in while_register.

      In [146]:
      xxxxxxxxxx
      
      40
       
      1
      def check_registered_loops_for_compatibility(idx_map, while_register, module):
      
      2
          seen = {}
      
      3
          to_replace = []
      
      4
          idx_keys = sorted(idx_map.keys())
      
      5
          for while_key, f in while_register[0]:
      
      6
              # try sampling here.
      
      7
              my_values = while_register[0][(while_key, f)]
      
      8
              v_ = random.choice(my_values)
      
      9
              for k in idx_keys:
      
      10
                  k_m = idx_map[k]
      
      11
                  if k_m[0] in seen: continue
      
      12
                  if len(my_values) > MAX_SAMPLES:
      
      13
                      lst = [v for v in my_values if not node_include(v[0], k_m)]
      
      14
                      values = sorted(lst, key=s_fn, reverse=True)[0:MAX_SAMPLES]
      
      15
                  else:
      
      16
                      values = my_values
      
      17
      ​
      
      18
                  # all values in v should be tried.
      
      19
                  replace = 0
      
      20
                  for v in values:
      
      21
                      assert v[0][0] == v_[0][0]
      
      22
                      if f != FILE or not node_include(v[0], k_m): # if not k_m includes v
      
      23
                          a = is_compatible((k_m, FILE, TREE), v, module)
      
      24
                          if not a:
      
      25
                              replace = 0
      
      26
                              break
      
      27
                          else:
      
      28
                              replace += 1
      
      29
                      if f != FILE or not node_include(k_m, v[0]):
      
      30
                          b = is_compatible(v, (k_m, FILE, TREE), module)
      
      31
                          if not b:
      
      32
                              replace = 0
      
      33
                              break
      
      34
                          else:
      
      35
                              replace += 1
      
      36
                  # at least one needs to vouch, and all capable needs to agree.
      
      37
                  if replace:
      
      38
                      to_replace.append((k_m, v_[0])) # <- replace k_m by v
      
      39
                      seen[k_m[0]] = True
      
      40
          replace_stack_and_mark_star(to_replace)
      
      executed in 15ms, finished 04:51:46 2019-08-15
      . . .
      1
       
      1
      Next, for all the loops that remain, check if they can be deleted. If they can be, we want to place `Epsilon == *` in place of `?` in the `can_empty` position.
      

      Next, for all the loops that remain, check if they can be deleted. If they can be, we want to place Epsilon == * in place of ? in the can_empty position.

      In [147]:
      xxxxxxxxxx
      
      11
       
      1
      def can_the_loop_be_deleted(idx_map, while_register, module):
      
      2
          idx_keys = sorted(idx_map.keys())
      
      3
          for i in idx_keys:
      
      4
              i_m = idx_map[i]
      
      5
              if '.0' in i_m[0]:
      
      6
                  # assert '?' not in i_m[0]
      
      7
                  continue
      
      8
              a = is_compatible((i_m, FILE, TREE), (['', [], 0, 0], FILE, TREE), module)
      
      9
              method1, ctrl1, cname1, num1, can_empty, cstack1 = parse_name(i_m[0])
      
      10
              name = unparse_name(method1, ctrl1, cname1, num1, Epsilon if a else NoEpsilon, cstack1)
      
      11
              i_m[0] = name
      
      executed in 7ms, finished 04:51:46 2019-08-15
      . . .
      3
       
      1
      Next, we check all current loops whether they are compatible with each other. Essentially, we start from the back, and check if the first or second or third ... nodes are compatible with the last node. Then take the second last node and do the same.
      
      2
      ​
      
      3
      If they are, we use the same name for all compatible nodes.
      

      Next, we check all current loops whether they are compatible with each other. Essentially, we start from the back, and check if the first or second or third ... nodes are compatible with the last node. Then take the second last node and do the same.

      If they are, we use the same name for all compatible nodes.

      In [148]:
      xxxxxxxxxx
      
      26
       
      1
      def check_current_loops_for_compatibility(idx_map, while_register, module):
      
      2
          to_replace = []
      
      3
          rkeys = sorted(idx_map.keys(), reverse=True)
      
      4
          for i in rkeys: # <- nodes to check for replacement -- started from the back
      
      5
              i_m = idx_map[i]
      
      6
              # assert '?' not in i_m[0]
      
      7
              if '.0' in i_m[0]: continue
      
      8
              j_keys = sorted([j for j in idx_map.keys() if j < i])
      
      9
              for j in j_keys: # <- nodes that we can replace i_m with -- starting from front.
      
      10
                  j_m = idx_map[j]
      
      11
                  # assert '?' not in j_m[0]
      
      12
                  if i_m[0] == j_m[0]: break
      
      13
                  # previous whiles worked.
      
      14
                  replace = False
      
      15
                  if not node_include(j_m, i_m):
      
      16
                      a = is_compatible((i_m, FILE, TREE), (j_m, FILE, TREE), module)
      
      17
                      if not a: continue
      
      18
                      replace = True
      
      19
                  if not node_include(i_m, j_m):
      
      20
                      b = is_compatible((j_m, FILE, TREE), (i_m, FILE, TREE), module)
      
      21
                      if not b: continue
      
      22
                      replace = True
      
      23
                  if replace:
      
      24
                      to_replace.append((i_m, j_m)) # <- replace i_m by j_m
      
      25
                  break
      
      26
          replace_stack_and_mark_star(to_replace)
      
      executed in 10ms, finished 04:51:46 2019-08-15
      . . .
      1
       
      1
      Finally, register all the new while loops discovered.
      

      Finally, register all the new while loops discovered.

      In [149]:
      xxxxxxxxxx
      
      26
       
      1
      def register_new_loops(idx_map, while_register):
      
      2
          idx_keys = sorted(idx_map.keys())
      
      3
          seen = {}
      
      4
          for k in idx_keys:
      
      5
              k_m = idx_map[k]
      
      6
              if ".0" not in k_m[0]:
      
      7
                  if k_m[0] in seen:
      
      8
                      k_m[0] = seen[k_m[0]]
      
      9
                      # and update
      
      10
                      method1, ctrl1, cname1, num1, can_empty1, cstack1 = parse_name(k_m[0])
      
      11
                      update_name(k_m, cstack1[-1], seen)
      
      12
                      continue
      
      13
                  # new! get a brand new name!
      
      14
                  while_register[1] += 1
      
      15
                  my_id = while_register[1]
      
      16
      ​
      
      17
                  original_name = k_m[0]
      
      18
                  #assert '?' not in original_name
      
      19
                  name, new_km = update_name(k_m, my_id, seen)
      
      20
                  #assert '?' not in name
      
      21
                  while_register[0][(name, FILE)] = [(new_km, FILE, TREE)]
      
      22
              else:
      
      23
                  name = k_m[0]
      
      24
                  if (name, FILE) not in while_register[0]:
      
      25
                      while_register[0][(name, FILE)] = []
      
      26
                  while_register[0][(name, FILE)].append((k_m, FILE, TREE))
      
      executed in 12ms, finished 04:51:46 2019-08-15
      . . .
      1
       
      1
      All together.
      

      All together.

      In [150]:
      xxxxxxxxxx
      
      12
       
      1
      def generalize_loop(idx_map, while_register, module):
      
      2
          # First we check the previous while loops
      
      3
          check_registered_loops_for_compatibility(idx_map, while_register, module)
      
      4
      ​
      
      5
          # Check whether any of these can be deleted.
      
      6
          can_the_loop_be_deleted(idx_map, while_register, module)
      
      7
          
      
      8
          # then we check he current while iterations
      
      9
          check_current_loops_for_compatibility(idx_map, while_register, module)
      
      10
      ​
      
      11
          # lastly, update all while names.
      
      12
          register_new_loops(idx_map, while_register)
      
      executed in 4ms, finished 04:51:46 2019-08-15
      . . .
      1
       
      1
      We keep a global registry of nodes, so that we can use the same iteration labels.
      

      We keep a global registry of nodes, so that we can use the same iteration labels.

      In [151]:
      xxxxxxxxxx
      
      1
       
      1
      # NODE_REGISTER = {}
      
      executed in 3ms, finished 04:51:46 2019-08-15
      . . .
      8
       
      1
      ### Collect loops to generalize
      
      2
      The idea is to look through the tree, looking for while loops.
      
      3
      When one sees a while loop, start at one end, and see if the
      
      4
      while iteration index can be replaced by the first one, and vice
      
      5
      versa. If not, try with the second one and so on until the first one
      
      6
      succeeds. When one succeeds, replace the definition of the matching
      
      7
      one with an alternate with the last's definition, and replace the
      
      8
      name of last with the first, and delete last. Here, we only collect the while loops with same labels, with `generalize_loop()` doing the rest.
      

      1.7.4  Collect loops to generalize¶

      The idea is to look through the tree, looking for while loops. When one sees a while loop, start at one end, and see if the while iteration index can be replaced by the first one, and vice versa. If not, try with the second one and so on until the first one succeeds. When one succeeds, replace the definition of the matching one with an alternate with the last's definition, and replace the name of last with the first, and delete last. Here, we only collect the while loops with same labels, with generalize_loop() doing the rest.

      In [152]:
      xxxxxxxxxx
      
      29
       
      1
      def generalize(tree, module):
      
      2
          node, children, *_rest = tree
      
      3
          if node not in NODE_REGISTER:
      
      4
              NODE_REGISTER[node] = {}
      
      5
          register = NODE_REGISTER[node]
      
      6
      ​
      
      7
          for child in children:
      
      8
              generalize(child, module)
      
      9
      ​
      
      10
          idxs = {}
      
      11
          last_while = None
      
      12
          for i,child in enumerate(children):
      
      13
              # now we need to map the while_name here to the ones in node
      
      14
              # register. Essentially, we try to replace each.
      
      15
              if ':while_' not in child[0]:
      
      16
                  continue
      
      17
              while_name = child[0].split(' ')[0]
      
      18
              if last_while is None:
      
      19
                  last_while = while_name
      
      20
                  if while_name not in register:
      
      21
                      register[while_name] = [{}, 0]
      
      22
              else:
      
      23
                  if last_while != while_name:
      
      24
                      # a new while! Generalize the last
      
      25
                      last_while = while_name
      
      26
                      generalize_loop(idxs, register[last_while])
      
      27
              idxs[i] = child
      
      28
          if last_while is not None:
      
      29
              generalize_loop(idxs, register[last_while], module)
      
      executed in 8ms, finished 04:51:46 2019-08-15
      . . .
      1
       
      1
      We need the ability for fairly deep surgery. So we dump and load the mined trees to convert tuples to arrays.
      

      We need the ability for fairly deep surgery. So we dump and load the mined trees to convert tuples to arrays.

      In [153]:
      xxxxxxxxxx
      
      12
       
      1
      def generalize_iter(jtrees, log=False):
      
      2
          global TREE, FILE
      
      3
          new_trees = []
      
      4
          for j in jtrees:
      
      5
              FILE = j['arg']
      
      6
              if log: print(FILE, file=sys.stderr)
      
      7
              sys.stderr.flush()
      
      8
              TREE = to_modifiable(j['tree'])
      
      9
              generalize(TREE, j['original'])
      
      10
              j['tree'] = TREE
      
      11
              new_trees.append(copy.deepcopy(j))
      
      12
          return new_trees
      
      executed in 6ms, finished 04:51:46 2019-08-15
      . . .
      In [154]:
      xxxxxxxxxx
      
      1
       
      1
      from fuzzingbook.GrammarFuzzer import extract_node as extract_node_o
      
      executed in 3ms, finished 04:51:46 2019-08-15
      . . .
      In [155]:
      xxxxxxxxxx
      
      3
       
      1
      %top reset_generalizer()
      
      2
      %top generalized_calc_trees = generalize_iter(mined_calc_trees)
      
      3
      %top zoom(display_tree(generalized_calc_trees[0]['tree'], extract_node=extract_node_o))
      
      executed in 2.26s, finished 04:51:48 2019-08-15
      Out[155]:
      . . .
      In [156]:
      xxxxxxxxxx
      
      3
       
      1
      %top reset_generalizer()
      
      2
      %top generalized_mathexpr_trees = generalize_iter(mined_mathexpr_trees)
      
      3
      %top zoom(display_tree(generalized_mathexpr_trees[1]['tree'], extract_node=extract_node_o))
      
      executed in 1.16s, finished 04:51:49 2019-08-15
      Out[156]:
      . . .
      3
       
      1
      ## Generating a Grammar
      
      2
      ​
      
      3
      Generating a grammar from the generalized derivation trees is pretty simple. Start at the start node, and any node that represents a method or a pseudo method becomes a nonterminal. The children forms alternate expansions for the nonterminal. Since all the keys are compatible, merging the grammar is simply merging the hash map.
      

      1.8  Generating a Grammar¶

      Generating a grammar from the generalized derivation trees is pretty simple. Start at the start node, and any node that represents a method or a pseudo method becomes a nonterminal. The children forms alternate expansions for the nonterminal. Since all the keys are compatible, merging the grammar is simply merging the hash map.

      1
       
      1
      First, we define a pretty printer for grammar.
      

      First, we define a pretty printer for grammar.

      In [157]:
      xxxxxxxxxx
      
      2
       
      1
      import re
      
      2
      RE_NONTERMINAL = re.compile(r'(<[^<> ]*>)')
      
      executed in 7ms, finished 04:51:49 2019-08-15
      . . .
      In [158]:
      xxxxxxxxxx
      
      15
       
      1
      def recurse_grammar(grammar, key, order, canonical):
      
      2
          rules = sorted(grammar[key])
      
      3
          old_len = len(order)
      
      4
          for rule in rules:
      
      5
              if not canonical:
      
      6
                  res =  re.findall(RE_NONTERMINAL, rule)
      
      7
              else:
      
      8
                  res = rule
      
      9
              for token in res:
      
      10
                  if token.startswith('<') and token.endswith('>'):
      
      11
                      if token not in order:
      
      12
                          order.append(token)
      
      13
          new = order[old_len:]
      
      14
          for ckey in new:
      
      15
              recurse_grammar(grammar, ckey, order, canonical)
      
      executed in 8ms, finished 04:51:49 2019-08-15
      . . .
      In [159]:
      xxxxxxxxxx
      
      5
       
      1
      def show_grammar(grammar, start_symbol='<START>', canonical=True):
      
      2
          order = [start_symbol]
      
      3
          recurse_grammar(grammar, start_symbol, order, canonical)
      
      4
          assert len(order) == len(grammar.keys())
      
      5
          return {k: sorted(grammar[k]) for k in order}
      
      executed in 4ms, finished 04:51:49 2019-08-15
      . . .
      1
       
      1
      ### Trees to grammar
      

      1.8.1  Trees to grammar¶

      In [160]:
      xxxxxxxxxx
      
      13
       
      1
      def to_grammar(tree, grammar):
      
      2
          node, children, _, _ = tree
      
      3
          tokens = []
      
      4
          if node not in grammar:
      
      5
              grammar[node] = list()
      
      6
          for c in children:
      
      7
              if c[1] == []:
      
      8
                  tokens.append(c[0])
      
      9
              else:
      
      10
                  tokens.append(c[0])
      
      11
                  to_grammar(c, grammar)
      
      12
          grammar[node].append(tuple(tokens))
      
      13
          return grammar
      
      executed in 7ms, finished 04:51:49 2019-08-15
      . . .
      In [161]:
      xxxxxxxxxx
      
      7
       
      1
      def merge_grammar(g1, g2):
      
      2
          all_keys = set(list(g1.keys()) + list(g2.keys()))
      
      3
          merged = {}
      
      4
          for k in all_keys:
      
      5
              alts = set(g1.get(k, []) + g2.get(k, []))
      
      6
              merged[k] = alts
      
      7
          return {k:[l for l in merged[k]] for k in merged}
      
      executed in 7ms, finished 04:51:49 2019-08-15
      . . .
      In [162]:
      xxxxxxxxxx
      
      8
       
      1
      def convert_to_grammar(my_trees):
      
      2
          grammar = {}
      
      3
          for my_tree in my_trees:
      
      4
              tree = my_tree['tree']
      
      5
              src = my_tree['original']
      
      6
              g = to_grammar(tree, grammar)
      
      7
              grammar = merge_grammar(grammar, g)
      
      8
          return grammar
      
      executed in 6ms, finished 04:51:49 2019-08-15
      . . .
      In [163]:
      xxxxxxxxxx
      
      2
       
      1
      %top calc_grammar = convert_to_grammar(generalized_calc_trees)
      
      2
      %top show_grammar(calc_grammar)
      
      executed in 9ms, finished 04:51:49 2019-08-15
      Out[163]:
      {'<START>': [('<main>',)],
       '<main>': [('<parse_expr>',)],
       '<parse_expr>': [('<parse_expr:while_1 = [1.0]>',),
        ('<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr:while_1 = [1.0]>'),
        ('<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr:while_1 = [1.0]>'),
        ('<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr:while_1 = [1.0]>')],
       '<parse_expr:while_1 = [1.0]>': [('<parse_expr:if_1 = 0#[1.0, -1]>',),
        ('<parse_expr:if_1 = 2#[1.0, -1]>',)],
       '<parse_expr:while_1 - [2.0]>': [('*',), ('+',), ('-',), ('/',)],
       '<parse_expr:if_1 = 0#[1.0, -1]>': [('<parse_num>',)],
       '<parse_expr:if_1 = 2#[1.0, -1]>': [('<parse_paren>',)],
       '<parse_num>': [('<is_digit>',),
        ('<is_digit>', '<is_digit>'),
        ('<is_digit>', '<is_digit>', '<is_digit>')],
       '<is_digit>': [('0',),
        ('1',),
        ('2',),
        ('3',),
        ('4',),
        ('5',),
        ('6',),
        ('7',),
        ('8',),
        ('9',)],
       '<parse_paren>': [('(', '<parse_expr>', ')')]}
      
      . . .
      In [164]:
      xxxxxxxxxx
      
      2
       
      1
      %top mathexpr_grammar = convert_to_grammar(generalized_mathexpr_trees)
      
      2
      %top show_grammar(mathexpr_grammar)
      
      executed in 11ms, finished 04:51:49 2019-08-15
      Out[164]:
      {'<START>': [('<main>',)],
       '<main>': [('<getValue>',)],
       '<getValue>': [('<parseExpression>',)],
       '<parseExpression>': [('<parseAddition>',)],
       '<parseAddition>': [('<parseMultiplication>',),
        ('<parseMultiplication>', '<parseAddition:while_1 - [1.0]>')],
       '<parseMultiplication>': [('<parseParenthesis>',),
        ('<parseParenthesis>', '<parseMultiplication:while_1 - [1.0]>')],
       '<parseAddition:while_1 - [1.0]>': [('+',
         '<parseAddition:if_1 = 0#[1.0, -1]>')],
       '<parseParenthesis>': [('<parseParenthesis:if_1 = 1#[-1]>',),
        ('<skipWhitespace>', '<parseParenthesis:if_1 = 1#[-1]>')],
       '<parseMultiplication:while_1 - [1.0]>': [('<skipWhitespace>',),
        ('<skipWhitespace>', '*', '<parseMultiplication:if_1 = 0#[1.0, -1]>')],
       '<parseParenthesis:if_1 = 1#[-1]>': [('<parseNegative>',)],
       '<skipWhitespace>': [('<skipWhitespace:while_1 - [1.0]>',)],
       '<parseNegative>': [('<parseNegative:if_1 = 1#[-1]>',)],
       '<parseNegative:if_1 = 1#[-1]>': [('<parseValue>',)],
       '<parseValue>': [('<parseValue:if_1 = 0#[-1]>',)],
       '<parseValue:if_1 = 0#[-1]>': [('<parseNumber>',)],
       '<parseNumber>': [('<parseNumber:while_1 - [1.0]>',),
        ('<parseNumber:while_1 - [1.0]>',
         '<parseNumber:while_1 - [1.0]>',
         '<parseNumber:while_1 - [1.0]>')],
       '<parseNumber:while_1 - [1.0]>': [('0',),
        ('1',),
        ('2',),
        ('3',),
        ('4',),
        ('5',)],
       '<skipWhitespace:while_1 - [1.0]>': [(' ',)],
       '<parseMultiplication:if_1 = 0#[1.0, -1]>': [('<parseParenthesis>',)],
       '<parseAddition:if_1 = 0#[1.0, -1]>': [('<parseMultiplication>',)]}
      
      . . .
      1
       
      1
      The grammar generated may still contain meta characters such as `<` and `>`. We need to cleanup these to make it a grammar that is fuzzable using the Fuzzingbook fuzzers.
      

      The grammar generated may still contain meta characters such as < and >. We need to cleanup these to make it a grammar that is fuzzable using the Fuzzingbook fuzzers.

      In [165]:
      xxxxxxxxxx
      
      13
       
      1
      def to_fuzzable_grammar(grammar):
      
      2
          def escape(t):
      
      3
              if ((t[0]+t[-1]) == '<>'):
      
      4
                  return t.replace(' ', '_') 
      
      5
              else:
      
      6
                  return t
      
      7
          new_g = {}
      
      8
          for k in grammar:
      
      9
              new_alt = []
      
      10
              for rule in grammar[k]:
      
      11
                  new_alt.append(''.join([escape(t) for t in rule]))
      
      12
              new_g[k.replace(' ', '_')] = new_alt
      
      13
          return new_g
      
      executed in 9ms, finished 04:51:49 2019-08-15
      . . .
      In [166]:
      xxxxxxxxxx
      
      1
       
      1
      from fuzzingbook import GrammarFuzzer
      
      executed in 5ms, finished 04:51:49 2019-08-15
      . . .
      In [167]:
      xxxxxxxxxx
      
      6
       
      1
      %%top
      
      2
      # [(
      
      3
      gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(calc_grammar), start_symbol='<START>')
      
      4
      for i in range(10):
      
      5
          print(gf.fuzz())
      
      6
      # )]
      
      executed in 124ms, finished 04:51:49 2019-08-15
      (6+9-8+0)-((4)-(5))/(4+0)
      5/011
      204/(9/4/5*0)
      10+(((2/4-8))/584*41)/((((3*9*8)*07)))
      (2/(0))/(8-4+2)*(4)*(9/0)
      434/48/9-585
      546-6
      3/315*((2)*(3)-5)/(2/6+1/0)
      (((1/0)*27-(3/8-9+5)))/(6)
      0*9*(1+2-7-0)
      
      . . .
      In [168]:
      xxxxxxxxxx
      
      6
       
      1
      %%top
      
      2
      # [(
      
      3
      gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(mathexpr_grammar), start_symbol='<START>')
      
      4
      for i in range(10):
      
      5
          print(gf.fuzz())
      
      6
      # )]
      
      executed in 44ms, finished 04:51:49 2019-08-15
      2 *145
      1 * 2+ 401
       5 
      254 
       301+032
       004 +5
      4 * 2+011
      1+420 
      224
       2
      
      . . .
      1
       
      1
      ### Inserting Empty Alternatives for IF and Loops
      

      1.8.2  Inserting Empty Alternatives for IF and Loops¶

      1
       
      1
      Next, we want to insert empty rules for those loops and conditionals that can be skipped. For loops, the entire sequence has to contain the empty marker.
      

      Next, we want to insert empty rules for those loops and conditionals that can be skipped. For loops, the entire sequence has to contain the empty marker.

      In [169]:
      xxxxxxxxxx
      
      16
       
      1
      def check_empty_rules(grammar):
      
      2
          new_grammar = {}
      
      3
          for k in grammar:
      
      4
              if k in ':if_':
      
      5
                  name, marker = k.split('#')
      
      6
                  if name.endswith(' *'):
      
      7
                      new_grammar[k] = grammar[k].add(('',))
      
      8
                  else:
      
      9
                      new_grammar[k] = grammar[k]
      
      10
              elif k in ':while_':
      
      11
                  # TODO -- we have to check the rules for sequences of whiles.
      
      12
                  # for now, ignore.
      
      13
                  new_grammar[k] = grammar[k]
      
      14
              else:
      
      15
                  new_grammar[k] = grammar[k]
      
      16
          return new_grammar
      
      executed in 6ms, finished 04:51:49 2019-08-15
      . . .
      In [170]:
      xxxxxxxxxx
      
      2
       
      1
      %top ne_calc_grammar = check_empty_rules(calc_grammar)
      
      2
      %top show_grammar(ne_calc_grammar)
      
      executed in 12ms, finished 04:51:49 2019-08-15
      Out[170]:
      {'<START>': [('<main>',)],
       '<main>': [('<parse_expr>',)],
       '<parse_expr>': [('<parse_expr:while_1 = [1.0]>',),
        ('<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr:while_1 = [1.0]>'),
        ('<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr:while_1 = [1.0]>'),
        ('<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr:while_1 = [1.0]>')],
       '<parse_expr:while_1 = [1.0]>': [('<parse_expr:if_1 = 0#[1.0, -1]>',),
        ('<parse_expr:if_1 = 2#[1.0, -1]>',)],
       '<parse_expr:while_1 - [2.0]>': [('*',), ('+',), ('-',), ('/',)],
       '<parse_expr:if_1 = 0#[1.0, -1]>': [('<parse_num>',)],
       '<parse_expr:if_1 = 2#[1.0, -1]>': [('<parse_paren>',)],
       '<parse_num>': [('<is_digit>',),
        ('<is_digit>', '<is_digit>'),
        ('<is_digit>', '<is_digit>', '<is_digit>')],
       '<is_digit>': [('0',),
        ('1',),
        ('2',),
        ('3',),
        ('4',),
        ('5',),
        ('6',),
        ('7',),
        ('8',),
        ('9',)],
       '<parse_paren>': [('(', '<parse_expr>', ')')]}
      
      . . .
      In [171]:
      xxxxxxxxxx
      
      6
       
      1
      %%top
      
      2
      # [(
      
      3
      gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(ne_calc_grammar), start_symbol='<START>')
      
      4
      for i in range(10):
      
      5
          print(repr(gf.fuzz()))
      
      6
      # )]
      
      executed in 164ms, finished 04:51:50 2019-08-15
      '20'
      '(0*7*2+5)*(2/1)'
      '((56+1+53)*((905)))'
      '59'
      '(7*(((((8)/((1+3+3)+(6-3*1))))*9*710)))'
      '28/3*75*40'
      '0/((2)/(8+4)/11)/((95/((4*3+6*9))))+6'
      '96-9'
      '(649-(5/1+((((212-76))/608)/4)*66))-18'
      '((2))'
      
      . . .
      In [172]:
      xxxxxxxxxx
      
      2
       
      1
      %top ne_mathexpr_grammar = check_empty_rules(mathexpr_grammar)
      
      2
      %top show_grammar(ne_mathexpr_grammar)
      
      executed in 24ms, finished 04:51:50 2019-08-15
      Out[172]:
      {'<START>': [('<main>',)],
       '<main>': [('<getValue>',)],
       '<getValue>': [('<parseExpression>',)],
       '<parseExpression>': [('<parseAddition>',)],
       '<parseAddition>': [('<parseMultiplication>',),
        ('<parseMultiplication>', '<parseAddition:while_1 - [1.0]>')],
       '<parseMultiplication>': [('<parseParenthesis>',),
        ('<parseParenthesis>', '<parseMultiplication:while_1 - [1.0]>')],
       '<parseAddition:while_1 - [1.0]>': [('+',
         '<parseAddition:if_1 = 0#[1.0, -1]>')],
       '<parseParenthesis>': [('<parseParenthesis:if_1 = 1#[-1]>',),
        ('<skipWhitespace>', '<parseParenthesis:if_1 = 1#[-1]>')],
       '<parseMultiplication:while_1 - [1.0]>': [('<skipWhitespace>',),
        ('<skipWhitespace>', '*', '<parseMultiplication:if_1 = 0#[1.0, -1]>')],
       '<parseParenthesis:if_1 = 1#[-1]>': [('<parseNegative>',)],
       '<skipWhitespace>': [('<skipWhitespace:while_1 - [1.0]>',)],
       '<parseNegative>': [('<parseNegative:if_1 = 1#[-1]>',)],
       '<parseNegative:if_1 = 1#[-1]>': [('<parseValue>',)],
       '<parseValue>': [('<parseValue:if_1 = 0#[-1]>',)],
       '<parseValue:if_1 = 0#[-1]>': [('<parseNumber>',)],
       '<parseNumber>': [('<parseNumber:while_1 - [1.0]>',),
        ('<parseNumber:while_1 - [1.0]>',
         '<parseNumber:while_1 - [1.0]>',
         '<parseNumber:while_1 - [1.0]>')],
       '<parseNumber:while_1 - [1.0]>': [('0',),
        ('1',),
        ('2',),
        ('3',),
        ('4',),
        ('5',)],
       '<skipWhitespace:while_1 - [1.0]>': [(' ',)],
       '<parseMultiplication:if_1 = 0#[1.0, -1]>': [('<parseParenthesis>',)],
       '<parseAddition:if_1 = 0#[1.0, -1]>': [('<parseMultiplication>',)]}
      
      . . .
      In [173]:
      xxxxxxxxxx
      
      6
       
      1
      %%top
      
      2
      # [(
      
      3
      gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(ne_mathexpr_grammar), start_symbol='<START>')
      
      4
      for i in range(10):
      
      5
          print(repr(gf.fuzz()))
      
      6
      # )]
      
      executed in 63ms, finished 04:51:50 2019-08-15
      '4 * 1'
      ' 4+ 1'
      ' 051 *155+ 3'
      ' 303 +512 '
      '2+ 2 * 1'
      '124 + 5'
      '444+ 221'
      '4+5 '
      '3 *1'
      '0+233'
      
      . . .
      1
       
      1
      ### Learning Regular Expressions
      

      1.8.3  Learning Regular Expressions¶

      1
       
      1
      We now need to generalize the loops. The idea is to look for patterns exclusively in the similarly named while loops using any of the regular expression learners. For the prototype, we replaced the modified Sequitur with the modified Fernau which gave us better regular expressions than before. The main constraint we have is that we want to avoid repeated execution of program if possible. Fernau algorithm can recover a reasonably approximate regular exression based only on positive data.
      

      We now need to generalize the loops. The idea is to look for patterns exclusively in the similarly named while loops using any of the regular expression learners. For the prototype, we replaced the modified Sequitur with the modified Fernau which gave us better regular expressions than before. The main constraint we have is that we want to avoid repeated execution of program if possible. Fernau algorithm can recover a reasonably approximate regular exression based only on positive data.

      3
       
      1
      #### The modified Fernau algorithm
      
      2
      ​
      
      3
      The Fernau algorithm is from _Algorithms for learning regular expressions from positive data_ by _HenningFernau_. Our algorithm uses a modified form of the Prefix-Tree-Acceptor from Fernau. First we define an LRF buffer of a given size.
      

      1.8.3.1  The modified Fernau algorithm¶

      The Fernau algorithm is from Algorithms for learning regular expressions from positive data by HenningFernau. Our algorithm uses a modified form of the Prefix-Tree-Acceptor from Fernau. First we define an LRF buffer of a given size.

      In [174]:
      xxxxxxxxxx
      
      5
       
      1
      import json
      
      2
      class Buf:
      
      3
          def __init__(self, size):
      
      4
              self.size = size
      
      5
              self.items = [None] * self.size
      
      executed in 4ms, finished 04:51:50 2019-08-15
      . . .
      1
       
      1
      The `add1()` takes in an array, and transfers the first element of the array into the end of current buffer, and simultaneously drops the first element of the buffer.
      

      The add1() takes in an array, and transfers the first element of the array into the end of current buffer, and simultaneously drops the first element of the buffer.

      In [175]:
      xxxxxxxxxx
      
      4
       
      1
      class Buf(Buf):
      
      2
          def add1(self, items):
      
      3
              self.items.append(items.pop(0))
      
      4
              return self.items.pop(0)
      
      executed in 6ms, finished 04:51:50 2019-08-15
      . . .
      1
       
      1
      For equality between the buffer and an array, we only compare when both the array and the items are actually elements and not chunked arrays.
      

      For equality between the buffer and an array, we only compare when both the array and the items are actually elements and not chunked arrays.

      In [176]:
      xxxxxxxxxx
      
      5
       
      1
      class Buf(Buf):
      
      2
          def __eq__(self, items):
      
      3
              if any(isinstance(i, dict) for i in self.items): return False
      
      4
              if any(isinstance(i, dict) for i in items): return False
      
      5
              return items == self.items
      
      executed in 8ms, finished 04:51:50 2019-08-15
      . . .
      1
       
      1
      The `detect_chunks()` detects any repeating portions of a list of `n` size.
      

      The detect_chunks() detects any repeating portions of a list of n size.

      In [177]:
      xxxxxxxxxx
      
      14
       
      1
      def detect_chunks(n, lst_):
      
      2
          lst = list(lst_)
      
      3
          chunks = set()
      
      4
          last = Buf(n)
      
      5
          # check if the next_n elements are repeated.
      
      6
          for _ in range(len(lst) - n):
      
      7
              lnext_n = lst[0:n]
      
      8
              if last == lnext_n:
      
      9
                  # found a repetition.
      
      10
                  chunks.add(tuple(last.items))
      
      11
              else:
      
      12
                  pass
      
      13
              last.add1(lst)
      
      14
          return chunks
      
      executed in 8ms, finished 04:51:50 2019-08-15
      . . .
      1
       
      1
      Once we have detected plausible repeating sequences, we gather all similar sequences into arrays.
      

      Once we have detected plausible repeating sequences, we gather all similar sequences into arrays.

      In [178]:
      xxxxxxxxxx
      
      12
       
      1
      def chunkify(lst_,n , chunks):
      
      2
          lst = list(lst_)
      
      3
          chunked_lst = []
      
      4
          while len(lst) >= n:
      
      5
              lnext_n = lst[0:n]
      
      6
              if (not any(isinstance(i, dict) for i in lnext_n)) and tuple(lnext_n) in chunks:
      
      7
                  chunked_lst.append({'_':lnext_n})
      
      8
                  lst = lst[n:]
      
      9
              else:
      
      10
                  chunked_lst.append(lst.pop(0))
      
      11
          chunked_lst.extend(lst)
      
      12
          return chunked_lst
      
      executed in 7ms, finished 04:51:50 2019-08-15
      . . .
      1
       
      1
      The `identify_chunks()` simply calls the `detect_chunks()` on all given lists, and then converts all chunks identified into arrays.
      

      The identify_chunks() simply calls the detect_chunks() on all given lists, and then converts all chunks identified into arrays.

      In [179]:
      xxxxxxxxxx
      
      21
       
      1
      def identify_chunks(my_lsts):
      
      2
          # initialize
      
      3
          all_chunks = {}
      
      4
          maximum = max(len(lst) for lst in my_lsts)
      
      5
          for i in range(1, maximum//2+1):
      
      6
              all_chunks[i] = set()
      
      7
      ​
      
      8
          # First, identify chunks in each list.
      
      9
          for lst in my_lsts:
      
      10
              for i in range(1,maximum//2+1):
      
      11
                  chunks = detect_chunks(i, lst)
      
      12
                  all_chunks[i] |= chunks
      
      13
      ​
      
      14
          # Then, chunkify
      
      15
          new_lsts = []
      
      16
          for lst in my_lsts:
      
      17
              for i in range(1,maximum//2+1):
      
      18
                  chunks = all_chunks[i]
      
      19
                  lst = chunkify(lst, i, chunks)
      
      20
              new_lsts.append(lst)
      
      21
          return new_lsts
      
      executed in 9ms, finished 04:51:50 2019-08-15
      . . .
      3
       
      1
      ##### Prefix tree acceptor
      
      2
      ​
      
      3
      The prefix tree acceptor is a way to represent positive data. The `Node` class holds a single node in the prefix tree acceptor.
      
      Prefix tree acceptor¶

      The prefix tree acceptor is a way to represent positive data. The Node class holds a single node in the prefix tree acceptor.

      In [180]:
      xxxxxxxxxx
      
      56
       
      1
      class Node:
      
      2
          # Each tree node gets its unique id.
      
      3
          _uid = 0
      
      4
          def __init__(self, item):
      
      5
              # self.repeats = False
      
      6
              self.count = 1 # how many repetitions.
      
      7
              self.counters = set()
      
      8
              self.last = False
      
      9
              self.children = []
      
      10
              self.item = item
      
      11
              self.uid = Node._uid
      
      12
              Node._uid += 1
      
      13
      ​
      
      14
          def update_counters(self):
      
      15
              self.counters.add(self.count)
      
      16
              self.count = 0
      
      17
              for c in self.children:
      
      18
                  c.update_counters()
      
      19
      ​
      
      20
          def __repr__(self):
      
      21
              return str(self.to_json())
      
      22
      ​
      
      23
          def __str__(self):
      
      24
              return str("(%s, [%s])", (self.item, ' '.join([str(i) for i in self.children])))
      
      25
      ​
      
      26
          def to_json(self):
      
      27
              s = ("(%s)" % ' '.join(self.item['_'])) if isinstance(self.item, dict) else str(self.item)
      
      28
              return (s, tuple(self.counters), [i.to_json() for i in self.children])
      
      29
      ​
      
      30
          def inc_count(self):
      
      31
              self.count += 1
      
      32
      ​
      
      33
          def add_ref(self):
      
      34
              self.count = 1
      
      35
      ​
      
      36
          def get_child(self, c):
      
      37
              for i in self.children:
      
      38
                  if i.item == c: return i
      
      39
              return None
      
      40
      ​
      
      41
          def add_child(self, c):
      
      42
              # first check if it is the current node. If it is, increment
      
      43
              # count, and return ourselves.
      
      44
              if c == self.item:
      
      45
                  self.inc_count()
      
      46
                  return self
      
      47
              else:
      
      48
                  # check if it is one of the children. If it is a child, then
      
      49
                  # preserve its original count.
      
      50
                  nc = self.get_child(c)
      
      51
                  if nc is None:
      
      52
                      nc = Node(c)
      
      53
                      self.children.append(nc)
      
      54
                  else:
      
      55
                      nc.add_ref()
      
      56
                  return nc
      
      executed in 16ms, finished 04:51:50 2019-08-15
      . . .
      In [181]:
      xxxxxxxxxx
      
      74
       
      1
      def update_tree(lst_, root):
      
      2
          lst = list(lst_)
      
      3
          branch = root
      
      4
          while lst:
      
      5
              first, *lst = lst
      
      6
              branch = branch.add_child(first)
      
      7
          branch.last = True
      
      8
          return root
      
      9
      ​
      
      10
      def create_tree_with_lsts(lsts):
      
      11
          Node._uid = 0
      
      12
          root =  Node(None)
      
      13
          for lst in lsts:
      
      14
              root.count = 1 # there is at least one element.
      
      15
              update_tree(lst, root)
      
      16
              root.update_counters()
      
      17
          return root
      
      18
      ​
      
      19
      def get_star(node, key):
      
      20
          if node.item is None:
      
      21
              return ''
      
      22
          if isinstance(node.item, dict):
      
      23
              # take care of counters
      
      24
              elements = node.item['_']
      
      25
              my_key = "<%s-%d-s>" % (key, node.uid)
      
      26
              alts = [elements]
      
      27
              if len(node.counters) > 1: # repetition
      
      28
                  alts.append(elements + [my_key])
      
      29
              return [my_key], {my_key:alts}
      
      30
          else:
      
      31
              return [str(node.item)], {}
      
      32
      ​
      
      33
      def node_to_grammar(node, grammar, key):
      
      34
          rule = []
      
      35
          alts = [rule]
      
      36
          if node.uid == 0:
      
      37
              my_key = "<%s>" % key
      
      38
          else:
      
      39
              my_key = "<%s-%d>" % (key, node.uid)
      
      40
          grammar[my_key] = alts
      
      41
          if node.item is not None:
      
      42
              mk, g = get_star(node, key)
      
      43
              rule.extend(mk)
      
      44
              grammar.update(g)
      
      45
          # is the node last?
      
      46
          if node.last:
      
      47
              assert node.item is not None
      
      48
              # add a duplicate rule that ends here.
      
      49
              ending_rule = list(rule)
      
      50
              # if there are no children, the current rule is
      
      51
              # any way ending.
      
      52
              if node.children:
      
      53
                  alts.append(ending_rule)
      
      54
      ​
      
      55
          if node.children:
      
      56
              if len(node.children) > 1:
      
      57
                  my_ckey = "<%s-%d-c>" % (key, node.uid)
      
      58
                  rule.append(my_ckey)
      
      59
                  grammar[my_ckey] = [ ["<%s-%d>" % (key, c.uid)] for c in node.children]
      
      60
              else:
      
      61
                  my_ckey = "<%s-%d>" % (key, node.children[0].uid)
      
      62
                  rule.append(my_ckey)
      
      63
          else:
      
      64
              pass
      
      65
          for c in node.children:
      
      66
              node_to_grammar(c, grammar, key)
      
      67
          return grammar
      
      68
      ​
      
      69
      def generate_grammar(lists, key):
      
      70
          lsts = identify_chunks(lists)
      
      71
          tree = create_tree_with_lsts(lsts)
      
      72
          grammar = {}
      
      73
          node_to_grammar(tree, grammar, key)
      
      74
          return grammar
      
      executed in 34ms, finished 04:51:50 2019-08-15
      . . .
      1
       
      1
      Given a rule, determine the abstraction for it.
      

      Given a rule, determine the abstraction for it.

      In [182]:
      xxxxxxxxxx
      
      4
       
      1
      def collapse_alts(rules, k):
      
      2
          ss = [[str(r) for r in rule] for rule in rules]
      
      3
          x = generate_grammar(ss, k[1:-1])
      
      4
          return x
      
      executed in 8ms, finished 04:51:50 2019-08-15
      . . .
      In [183]:
      xxxxxxxxxx
      
      10
       
      1
      def collapse_rules(grammar):
      
      2
          r_grammar = {}
      
      3
          for k in grammar:
      
      4
              new_grammar = collapse_alts(grammar[k], k)
      
      5
              # merge the new_grammar with r_grammar
      
      6
              # we know none of the keys exist in r_grammar because
      
      7
              # new keys are k prefixed.
      
      8
              for k_ in new_grammar:
      
      9
                  r_grammar[k_] = new_grammar[k_]
      
      10
          return r_grammar
      
      executed in 8ms, finished 04:51:50 2019-08-15
      . . .
      In [184]:
      xxxxxxxxxx
      
      2
       
      1
      %top collapsed_calc_grammar = collapse_rules(ne_calc_grammar)
      
      2
      %top show_grammar(collapsed_calc_grammar)
      
      executed in 13ms, finished 04:51:50 2019-08-15
      Out[184]:
      {'<START>': [['<START-1>']],
       '<START-1>': [['<main>']],
       '<main>': [['<main-1>']],
       '<main-1>': [['<parse_expr>']],
       '<parse_expr>': [['<parse_expr-0-c>']],
       '<parse_expr-0-c>': [['<parse_expr-1>'], ['<parse_expr-3>']],
       '<parse_expr-1>': [['<parse_expr-1-s>', '<parse_expr-2>']],
       '<parse_expr-3>': [['<parse_expr:while_1 = [1.0]>']],
       '<parse_expr-1-s>': [['<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>'],
        ['<parse_expr:while_1 = [1.0]>',
         '<parse_expr:while_1 - [2.0]>',
         '<parse_expr-1-s>']],
       '<parse_expr-2>': [['<parse_expr:while_1 = [1.0]>']],
       '<parse_expr:while_1 = [1.0]>': [['<parse_expr:while_1 = [1.0]-0-c>']],
       '<parse_expr:while_1 - [2.0]>': [['<parse_expr:while_1 - [2.0]-0-c>']],
       '<parse_expr:while_1 = [1.0]-0-c>': [['<parse_expr:while_1 = [1.0]-1>'],
        ['<parse_expr:while_1 = [1.0]-2>']],
       '<parse_expr:while_1 = [1.0]-1>': [['<parse_expr:if_1 = 2#[1.0, -1]>']],
       '<parse_expr:while_1 = [1.0]-2>': [['<parse_expr:if_1 = 0#[1.0, -1]>']],
       '<parse_expr:if_1 = 2#[1.0, -1]>': [['<parse_expr:if_1 = 2#[1.0, -1]-1>']],
       '<parse_expr:if_1 = 2#[1.0, -1]-1>': [['<parse_paren>']],
       '<parse_paren>': [['<parse_paren-1>']],
       '<parse_paren-1>': [['(', '<parse_paren-2>']],
       '<parse_paren-2>': [['<parse_expr>', '<parse_paren-3>']],
       '<parse_paren-3>': [[')']],
       '<parse_expr:if_1 = 0#[1.0, -1]>': [['<parse_expr:if_1 = 0#[1.0, -1]-1>']],
       '<parse_expr:if_1 = 0#[1.0, -1]-1>': [['<parse_num>']],
       '<parse_num>': [['<parse_num-1>']],
       '<parse_num-1>': [['<parse_num-1-s>']],
       '<parse_num-1-s>': [['<is_digit>'], ['<is_digit>', '<parse_num-1-s>']],
       '<is_digit>': [['<is_digit-0-c>']],
       '<is_digit-0-c>': [['<is_digit-10>'],
        ['<is_digit-1>'],
        ['<is_digit-2>'],
        ['<is_digit-3>'],
        ['<is_digit-4>'],
        ['<is_digit-5>'],
        ['<is_digit-6>'],
        ['<is_digit-7>'],
        ['<is_digit-8>'],
        ['<is_digit-9>']],
       '<is_digit-10>': [['4']],
       '<is_digit-1>': [['1']],
       '<is_digit-2>': [['6']],
       '<is_digit-3>': [['9']],
       '<is_digit-4>': [['2']],
       '<is_digit-5>': [['3']],
       '<is_digit-6>': [['8']],
       '<is_digit-7>': [['0']],
       '<is_digit-8>': [['5']],
       '<is_digit-9>': [['7']],
       '<parse_expr:while_1 - [2.0]-0-c>': [['<parse_expr:while_1 - [2.0]-1>'],
        ['<parse_expr:while_1 - [2.0]-2>'],
        ['<parse_expr:while_1 - [2.0]-3>'],
        ['<parse_expr:while_1 - [2.0]-4>']],
       '<parse_expr:while_1 - [2.0]-1>': [['/']],
       '<parse_expr:while_1 - [2.0]-2>': [['-']],
       '<parse_expr:while_1 - [2.0]-3>': [['*']],
       '<parse_expr:while_1 - [2.0]-4>': [['+']]}
      
      . . .
      In [185]:
      xxxxxxxxxx
      
      6
       
      1
      %%top
      
      2
      # [(
      
      3
      gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(ne_mathexpr_grammar), start_symbol='<START>')
      
      4
      for i in range(10):
      
      5
          print(gf.fuzz())
      
      6
      # )]
      
      executed in 174ms, finished 04:51:50 2019-08-15
      152 
       510+0 * 0
       402 *442
       311
       134+ 3
       1+3
      003 
      405+1 
      0
       4 * 2+443 *2
      
      . . .
      In [186]:
      xxxxxxxxxx
      
      2
       
      1
      %top collapsed_mathexpr_grammar = collapse_rules(ne_mathexpr_grammar)
      
      2
      %top show_grammar(collapsed_mathexpr_grammar)
      
      executed in 24ms, finished 04:51:50 2019-08-15
      Out[186]:
      {'<START>': [['<START-1>']],
       '<START-1>': [['<main>']],
       '<main>': [['<main-1>']],
       '<main-1>': [['<getValue>']],
       '<getValue>': [['<getValue-1>']],
       '<getValue-1>': [['<parseExpression>']],
       '<parseExpression>': [['<parseExpression-1>']],
       '<parseExpression-1>': [['<parseAddition>']],
       '<parseAddition>': [['<parseAddition-1>']],
       '<parseAddition-1>': [['<parseMultiplication>'],
        ['<parseMultiplication>', '<parseAddition-2>']],
       '<parseMultiplication>': [['<parseMultiplication-1>']],
       '<parseAddition-2>': [['<parseAddition:while_1 - [1.0]>']],
       '<parseMultiplication-1>': [['<parseParenthesis>'],
        ['<parseParenthesis>', '<parseMultiplication-2>']],
       '<parseParenthesis>': [['<parseParenthesis-0-c>']],
       '<parseMultiplication-2>': [['<parseMultiplication:while_1 - [1.0]>']],
       '<parseParenthesis-0-c>': [['<parseParenthesis-1>'],
        ['<parseParenthesis-3>']],
       '<parseParenthesis-1>': [['<skipWhitespace>', '<parseParenthesis-2>']],
       '<parseParenthesis-3>': [['<parseParenthesis:if_1 = 1#[-1]>']],
       '<skipWhitespace>': [['<skipWhitespace-1>']],
       '<parseParenthesis-2>': [['<parseParenthesis:if_1 = 1#[-1]>']],
       '<skipWhitespace-1>': [['<skipWhitespace:while_1 - [1.0]>']],
       '<skipWhitespace:while_1 - [1.0]>': [['<skipWhitespace:while_1 - [1.0]-1>']],
       '<skipWhitespace:while_1 - [1.0]-1>': [[' ']],
       '<parseParenthesis:if_1 = 1#[-1]>': [['<parseParenthesis:if_1 = 1#[-1]-1>']],
       '<parseParenthesis:if_1 = 1#[-1]-1>': [['<parseNegative>']],
       '<parseNegative>': [['<parseNegative-1>']],
       '<parseNegative-1>': [['<parseNegative:if_1 = 1#[-1]>']],
       '<parseNegative:if_1 = 1#[-1]>': [['<parseNegative:if_1 = 1#[-1]-1>']],
       '<parseNegative:if_1 = 1#[-1]-1>': [['<parseValue>']],
       '<parseValue>': [['<parseValue-1>']],
       '<parseValue-1>': [['<parseValue:if_1 = 0#[-1]>']],
       '<parseValue:if_1 = 0#[-1]>': [['<parseValue:if_1 = 0#[-1]-1>']],
       '<parseValue:if_1 = 0#[-1]-1>': [['<parseNumber>']],
       '<parseNumber>': [['<parseNumber-1>']],
       '<parseNumber-1>': [['<parseNumber-1-s>']],
       '<parseNumber-1-s>': [['<parseNumber:while_1 - [1.0]>'],
        ['<parseNumber:while_1 - [1.0]>', '<parseNumber-1-s>']],
       '<parseNumber:while_1 - [1.0]>': [['<parseNumber:while_1 - [1.0]-0-c>']],
       '<parseNumber:while_1 - [1.0]-0-c>': [['<parseNumber:while_1 - [1.0]-1>'],
        ['<parseNumber:while_1 - [1.0]-2>'],
        ['<parseNumber:while_1 - [1.0]-3>'],
        ['<parseNumber:while_1 - [1.0]-4>'],
        ['<parseNumber:while_1 - [1.0]-5>'],
        ['<parseNumber:while_1 - [1.0]-6>']],
       '<parseNumber:while_1 - [1.0]-1>': [['1']],
       '<parseNumber:while_1 - [1.0]-2>': [['2']],
       '<parseNumber:while_1 - [1.0]-3>': [['0']],
       '<parseNumber:while_1 - [1.0]-4>': [['3']],
       '<parseNumber:while_1 - [1.0]-5>': [['5']],
       '<parseNumber:while_1 - [1.0]-6>': [['4']],
       '<parseMultiplication:while_1 - [1.0]>': [['<parseMultiplication:while_1 - [1.0]-1>']],
       '<parseMultiplication:while_1 - [1.0]-1>': [['<skipWhitespace>'],
        ['<skipWhitespace>', '<parseMultiplication:while_1 - [1.0]-2>']],
       '<parseMultiplication:while_1 - [1.0]-2>': [['*',
         '<parseMultiplication:while_1 - [1.0]-3>']],
       '<parseMultiplication:while_1 - [1.0]-3>': [['<parseMultiplication:if_1 = 0#[1.0, -1]>']],
       '<parseMultiplication:if_1 = 0#[1.0, -1]>': [['<parseMultiplication:if_1 = 0#[1.0, -1]-1>']],
       '<parseMultiplication:if_1 = 0#[1.0, -1]-1>': [['<parseParenthesis>']],
       '<parseAddition:while_1 - [1.0]>': [['<parseAddition:while_1 - [1.0]-1>']],
       '<parseAddition:while_1 - [1.0]-1>': [['+',
         '<parseAddition:while_1 - [1.0]-2>']],
       '<parseAddition:while_1 - [1.0]-2>': [['<parseAddition:if_1 = 0#[1.0, -1]>']],
       '<parseAddition:if_1 = 0#[1.0, -1]>': [['<parseAddition:if_1 = 0#[1.0, -1]-1>']],
       '<parseAddition:if_1 = 0#[1.0, -1]-1>': [['<parseMultiplication>']]}
      
      . . .
      In [187]:
      xxxxxxxxxx
      
      6
       
      1
      %%top
      
      2
      # [(
      
      3
      gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(collapsed_mathexpr_grammar), start_symbol='<START>')
      
      4
      for i in range(10):
      
      5
          print(gf.fuzz())
      
      6
      # )]
      
      executed in 268ms, finished 04:51:50 2019-08-15
      404
       5052 
       154+ 3
      24 *5+5
      222 * 224+ 01 
       05+252 
      255
      2
      1+ 0110
       42
      
      . . .
      In [188]:
      xxxxxxxxxx
      
      6
       
      1
      %%top
      
      2
      # [(
      
      3
      gf = GrammarFuzzer.GrammarFuzzer(to_fuzzable_grammar(collapsed_calc_grammar), start_symbol='<START>')
      
      4
      for i in range(10):
      
      5
          print(gf.fuzz())
      
      6
      # )]
      
      executed in 1.41s, finished 04:51:52 2019-08-15
      00/(((7/99+((6)/(6/(1)))+7)))
      (1)+415748/8
      8
      451
      ((0273-948))
      (7-5*0)*(8+1+6*2)
      1
      67+(5)*3
      ((91/4))
      (9*1*(53))*(6/1)*182
      
      . . .
      In [189]:
      xxxxxxxxxx
      
      14
       
      1
      def convert_spaces(grammar):
      
      2
          keys = {key: key.replace(' ', '_') for key in grammar}
      
      3
          new_grammar = {}
      
      4
          for key in grammar:
      
      5
              new_alt = []
      
      6
              for rule in grammar[key]:
      
      7
                  new_rule = []
      
      8
                  for t in rule:
      
      9
                      for k in keys:
      
      10
                          t = t.replace(k, keys[k])
      
      11
                      new_rule.append(t)
      
      12
                  new_alt.append(''.join(new_rule))
      
      13
              new_grammar[keys[key]] = new_alt
      
      14
          return new_grammar
      
      executed in 10ms, finished 04:51:52 2019-08-15
      . . .
      In [190]:
      xxxxxxxxxx
      
      2
       
      1
      %top calc_grammar = convert_spaces(collapsed_calc_grammar)
      
      2
      %top show_grammar(calc_grammar, canonical=False)
      
      executed in 20ms, finished 04:51:52 2019-08-15
      Out[190]:
      {'<START>': ['<START-1>'],
       '<START-1>': ['<main>'],
       '<main>': ['<main-1>'],
       '<main-1>': ['<parse_expr>'],
       '<parse_expr>': ['<parse_expr-0-c>'],
       '<parse_expr-0-c>': ['<parse_expr-1>', '<parse_expr-3>'],
       '<parse_expr-1>': ['<parse_expr-1-s><parse_expr-2>'],
       '<parse_expr-3>': ['<parse_expr:while_1_=_[1.0]>'],
       '<parse_expr-1-s>': ['<parse_expr:while_1_=_[1.0]><parse_expr:while_1_-_[2.0]>',
        '<parse_expr:while_1_=_[1.0]><parse_expr:while_1_-_[2.0]><parse_expr-1-s>'],
       '<parse_expr-2>': ['<parse_expr:while_1_=_[1.0]>'],
       '<parse_expr:while_1_=_[1.0]>': ['<parse_expr:while_1_=_[1.0]-0-c>'],
       '<parse_expr:while_1_-_[2.0]>': ['<parse_expr:while_1_-_[2.0]-0-c>'],
       '<parse_expr:while_1_=_[1.0]-0-c>': ['<parse_expr:while_1_=_[1.0]-1>',
        '<parse_expr:while_1_=_[1.0]-2>'],
       '<parse_expr:while_1_=_[1.0]-1>': ['<parse_expr:if_1_=_2#[1.0,_-1]>'],
       '<parse_expr:while_1_=_[1.0]-2>': ['<parse_expr:if_1_=_0#[1.0,_-1]>'],
       '<parse_expr:if_1_=_2#[1.0,_-1]>': ['<parse_expr:if_1_=_2#[1.0,_-1]-1>'],
       '<parse_expr:if_1_=_2#[1.0,_-1]-1>': ['<parse_paren>'],
       '<parse_paren>': ['<parse_paren-1>'],
       '<parse_paren-1>': ['(<parse_paren-2>'],
       '<parse_paren-2>': ['<parse_expr><parse_paren-3>'],
       '<parse_paren-3>': [')'],
       '<parse_expr:if_1_=_0#[1.0,_-1]>': ['<parse_expr:if_1_=_0#[1.0,_-1]-1>'],
       '<parse_expr:if_1_=_0#[1.0,_-1]-1>': ['<parse_num>'],
       '<parse_num>': ['<parse_num-1>'],
       '<parse_num-1>': ['<parse_num-1-s>'],
       '<parse_num-1-s>': ['<is_digit>', '<is_digit><parse_num-1-s>'],
       '<is_digit>': ['<is_digit-0-c>'],
       '<is_digit-0-c>': ['<is_digit-10>',
        '<is_digit-1>',
        '<is_digit-2>',
        '<is_digit-3>',
        '<is_digit-4>',
        '<is_digit-5>',
        '<is_digit-6>',
        '<is_digit-7>',
        '<is_digit-8>',
        '<is_digit-9>'],
       '<is_digit-10>': ['4'],
       '<is_digit-1>': ['1'],
       '<is_digit-2>': ['6'],
       '<is_digit-3>': ['9'],
       '<is_digit-4>': ['2'],
       '<is_digit-5>': ['3'],
       '<is_digit-6>': ['8'],
       '<is_digit-7>': ['0'],
       '<is_digit-8>': ['5'],
       '<is_digit-9>': ['7'],
       '<parse_expr:while_1_-_[2.0]-0-c>': ['<parse_expr:while_1_-_[2.0]-1>',
        '<parse_expr:while_1_-_[2.0]-2>',
        '<parse_expr:while_1_-_[2.0]-3>',
        '<parse_expr:while_1_-_[2.0]-4>'],
       '<parse_expr:while_1_-_[2.0]-1>': ['/'],
       '<parse_expr:while_1_-_[2.0]-2>': ['-'],
       '<parse_expr:while_1_-_[2.0]-3>': ['*'],
       '<parse_expr:while_1_-_[2.0]-4>': ['+']}
      
      . . .
      In [191]:
      xxxxxxxxxx
      
      1
       
      1
      from fuzzingbook import GrammarFuzzer, GrammarMiner, Parser
      
      executed in 502ms, finished 04:51:52 2019-08-15
      . . .
      In [192]:
      xxxxxxxxxx
      
      1
       
      1
      %top gf = GrammarFuzzer.GrammarFuzzer(calc_grammar, start_symbol='<START>')
      
      executed in 8ms, finished 04:51:52 2019-08-15
      . . .
      In [193]:
      xxxxxxxxxx
      
      5
       
      1
      %%top
      
      2
      # [(
      
      3
      for i in range(10):
      
      4
          print(gf.fuzz())
      
      5
      # )]
      
      executed in 9.29s, finished 04:52:02 2019-08-15
      (9+7+1-6*(8/9*1))
      (79*489+(643-(89))-(6))
      ((64))-((0)*1)+0/0-0/1*65
      (2/95)
      ((107038*((((959385427839*(8+(53)))/2)-2/920)))+20266)
      152+89
      09
      (3+(434)-(55-9))
      ((345)/(((((96*474-(86*(((((5)-66)))*(9+(((20)*(6/(724))+073-((9)))-95)))))))))+36)
      (((628)))/((3315+4))+8/(78119)
      
      . . .
      1
       
      1
      ### Remove duplicate and redundant entries
      

      1.8.4  Remove duplicate and redundant entries¶

      In [194]:
      xxxxxxxxxx
      
      7
       
      1
      def first_in_chain(token, chain):
      
      2
          while True:
      
      3
              if token in chain:
      
      4
                  token = chain[token]
      
      5
              else:
      
      6
                  break
      
      7
          return token
      
      executed in 6ms, finished 04:52:02 2019-08-15
      . . .
      1
       
      1
      Return a new symbol for `grammar` based on `symbol_name`.
      

      Return a new symbol for grammar based on symbol_name.

      In [195]:
      xxxxxxxxxx
      
      10
       
      1
      def new_symbol(grammar, symbol_name="<symbol>"):
      
      2
          if symbol_name not in grammar:
      
      3
              return symbol_name
      
      4
      ​
      
      5
          count = 1
      
      6
          while True:
      
      7
              tentative_symbol_name = symbol_name[:-1] + "-" + repr(count) + ">"
      
      8
              if tentative_symbol_name not in grammar:
      
      9
                  return tentative_symbol_name
      
      10
              count += 1
      
      executed in 7ms, finished 04:52:02 2019-08-15
      . . .
      1
       
      1
      Replace keys that have a single token definition with the token in the defition.
      

      Replace keys that have a single token definition with the token in the defition.

      In [196]:
      xxxxxxxxxx
      
      15
       
      1
      def replacement_candidates(grammar):
      
      2
          to_replace = {}
      
      3
          for k in grammar:
      
      4
              if len(grammar[k]) != 1: continue
      
      5
              if k in {'<START>', '<main>'}: continue
      
      6
              rule = grammar[k][0]
      
      7
              res =  re.findall(RE_NONTERMINAL, rule)
      
      8
              if len(res) == 1:
      
      9
                  if len(res[0]) != len(rule): continue
      
      10
                  to_replace[k] = first_in_chain(res[0], to_replace)
      
      11
              elif len(res) == 0:
      
      12
                  to_replace[k] = first_in_chain(rule, to_replace)
      
      13
              else:
      
      14
                  continue # more than one.
      
      15
          return to_replace
      
      executed in 8ms, finished 04:52:02 2019-08-15
      . . .
      In [197]:
      xxxxxxxxxx
      
      12
       
      1
      def replace_key_by_new_key(grammar, keys_to_replace):
      
      2
          new_grammar = {}
      
      3
          for key in grammar:
      
      4
              new_rules = []
      
      5
              for rule in grammar[key]:
      
      6
                  for k in keys_to_replace:
      
      7
                      new_key = keys_to_replace[k]
      
      8
                      rule = rule.replace(k, keys_to_replace[k])
      
      9
                  new_rules.append(rule)
      
      10
              new_grammar[keys_to_replace.get(key, key)] = new_rules
      
      11
          assert len(grammar) == len(new_grammar)
      
      12
          return new_grammar
      
      executed in 7ms, finished 04:52:02 2019-08-15
      . . .
      In [198]:
      xxxxxxxxxx
      
      13
       
      1
      def replace_key_by_key(grammar, keys_to_replace):
      
      2
          new_grammar = {}
      
      3
          for key in grammar:
      
      4
              if key in keys_to_replace:
      
      5
                  continue
      
      6
              new_rules = []
      
      7
              for rule in grammar[key]:
      
      8
                  for k in keys_to_replace:
      
      9
                      new_key = keys_to_replace[k]
      
      10
                      rule = rule.replace(k, keys_to_replace[k])
      
      11
                  new_rules.append(rule)
      
      12
              new_grammar[key] = new_rules
      
      13
          return new_grammar
      
      executed in 7ms, finished 04:52:02 2019-08-15
      . . .
      In [199]:
      xxxxxxxxxx
      
      3
       
      1
      def remove_single_entries(grammar):
      
      2
          keys_to_replace = replacement_candidates(grammar)
      
      3
          return replace_key_by_key(grammar, keys_to_replace)
      
      executed in 5ms, finished 04:52:02 2019-08-15
      . . .
      1
       
      1
      Remove keys that have similar rules.
      

      Remove keys that have similar rules.

      In [200]:
      xxxxxxxxxx
      
      9
       
      1
      def collect_duplicate_rule_keys(grammar):
      
      2
          collect = {}
      
      3
          for k in grammar:
      
      4
              salt = str(sorted(grammar[k]))
      
      5
              if salt not in collect:
      
      6
                  collect[salt] = (k, set())
      
      7
              else:
      
      8
                  collect[salt][1].add(k)
      
      9
          return collect
      
      executed in 6ms, finished 04:52:02 2019-08-15
      . . .
      In [201]:
      xxxxxxxxxx
      
      13
       
      1
      def remove_duplicate_rule_keys(grammar):
      
      2
          g = grammar
      
      3
          while True:
      
      4
              collect = collect_duplicate_rule_keys(g)
      
      5
              keys_to_replace = {}
      
      6
              for salt in collect:
      
      7
                  k, st = collect[salt]
      
      8
                  for s in st:
      
      9
                      keys_to_replace[s] = k
      
      10
              if not keys_to_replace:
      
      11
                  break
      
      12
              g = replace_key_by_key(g, keys_to_replace)
      
      13
          return g
      
      executed in 7ms, finished 04:52:02 2019-08-15
      . . .
      1
       
      1
      Remove all the control flow vestiges from names, and simply name them sequentially.
      

      Remove all the control flow vestiges from names, and simply name them sequentially.

      In [202]:
      xxxxxxxxxx
      
      13
       
      1
      def collect_replacement_keys(grammar):
      
      2
          g = copy.deepcopy(grammar)
      
      3
          to_replace = {}
      
      4
          for k in grammar:
      
      5
              if ':' in k:
      
      6
                  first, rest = k.split(':')
      
      7
                  sym = new_symbol(g, symbol_name=first + '>')
      
      8
                  assert sym not in g
      
      9
                  g[sym] = None
      
      10
                  to_replace[k] = sym
      
      11
              else:
      
      12
                  continue
      
      13
          return to_replace
      
      executed in 8ms, finished 04:52:02 2019-08-15
      . . .
      In [203]:
      xxxxxxxxxx
      
      4
       
      1
      def cleanup_tokens(grammar):
      
      2
          keys_to_replace = collect_replacement_keys(grammar)
      
      3
          g = replace_key_by_new_key(grammar, keys_to_replace)
      
      4
          return g
      
      executed in 5ms, finished 04:52:02 2019-08-15
      . . .
      In [204]:
      xxxxxxxxxx
      
      15
       
      1
      def replaceAngular(grammar):
      
      2
          new_g = {}
      
      3
          replaced = False
      
      4
          for k in grammar:
      
      5
              new_rules = []
      
      6
              for rule in grammar[k]:
      
      7
                  new_rule = rule.replace('<>', '<openA><closeA>').replace('</>', '<openA>/<closeA>')
      
      8
                  if rule != new_rule:
      
      9
                      replaced = True
      
      10
                  new_rules.append(new_rule)
      
      11
              new_g[k] = new_rules
      
      12
          if replaced:
      
      13
              new_g['<openA>'] = ['<']
      
      14
              new_g['<closeA>'] = ['<']
      
      15
          return new_g
      
      executed in 8ms, finished 04:52:02 2019-08-15
      . . .
      4
       
      1
      Remove keys that are referred to only from a single rule, and which have a single alternative.
      
      2
      Import. This can't work on canonical representation. First, given a key, we figure out its distance to `<START>`.
      
      3
      ​
      
      4
      This is different from `remove_single_entries()` in that, there we do not care if the key is being used multiple times. Here, we only replace keys that are referred to only once.
      

      Remove keys that are referred to only from a single rule, and which have a single alternative. Import. This can't work on canonical representation. First, given a key, we figure out its distance to <START>.

      This is different from remove_single_entries() in that, there we do not care if the key is being used multiple times. Here, we only replace keys that are referred to only once.

      In [205]:
      xxxxxxxxxx
      
      1
       
      1
      import math
      
      executed in 5ms, finished 04:52:02 2019-08-15
      . . .
      In [206]:
      xxxxxxxxxx
      
      9
       
      1
      def len_to_start(item, parents, seen=None):
      
      2
          if seen is None: seen = set()
      
      3
          if item in seen:
      
      4
              return math.inf
      
      5
          seen.add(item)
      
      6
          if item == '<START>':
      
      7
              return 0
      
      8
          else:
      
      9
              return 1 + min(len_to_start(p, parents, seen) for p in parents[item])
      
      executed in 6ms, finished 04:52:02 2019-08-15
      . . .
      In [207]:
      xxxxxxxxxx
      
      2
       
      1
      def order_by_length_to_start(items, parents):
      
      2
          return sorted(items, key=lambda i: len_to_start(i, parents))
      
      executed in 6ms, finished 04:52:02 2019-08-15
      . . .
      1
       
      1
      Next, we generate a map of `child -> [parents]`.
      

      Next, we generate a map of child -> [parents].

      In [208]:
      xxxxxxxxxx
      
      15
       
      1
      def id_parents(grammar, key, seen=None, parents=None):
      
      2
          if parents is None:
      
      3
              parents = {}
      
      4
              seen = set()
      
      5
          if key in seen: return
      
      6
          seen.add(key)
      
      7
          for rule in grammar[key]:
      
      8
              res = re.findall(RE_NONTERMINAL, rule)
      
      9
              for token in res:
      
      10
                  if token.startswith('<') and token.endswith('>'):
      
      11
                      if token not in parents: parents[token] = list()
      
      12
                      parents[token].append(key)
      
      13
          for ckey in {i for i in  grammar if i not in seen}:
      
      14
              id_parents(grammar, ckey, seen, parents)
      
      15
          return parents
      
      executed in 8ms, finished 04:52:02 2019-08-15
      . . .
      1
       
      1
      Now, all together.
      

      Now, all together.

      In [209]:
      xxxxxxxxxx
      
      10
       
      1
      def remove_single_alts(grammar, start_symbol='<START>'):
      
      2
          single_alts = {p for p in grammar if len(grammar[p]) == 1 and p != start_symbol}
      
      3
      ​
      
      4
          child_parent_map = id_parents(grammar, start_symbol)
      
      5
      ​
      
      6
          single_refs = {p:child_parent_map[p] for p in single_alts if len(child_parent_map[p]) <= 1}
      
      7
      ​
      
      8
          keys_to_replace = {p:grammar[p][0] for p in order_by_length_to_start(single_refs, child_parent_map)}
      
      9
          g =  replace_key_by_key(grammar, keys_to_replace)
      
      10
          return g
      
      executed in 7ms, finished 04:52:02 2019-08-15
      . . .
      1
       
      1
      ## Accio Grammar
      

      1.9  Accio Grammar¶

      In [210]:
      xxxxxxxxxx
      
      2
       
      1
      import os
      
      2
      import hashlib
      
      executed in 5ms, finished 04:52:02 2019-08-15
      . . .
      In [211]:
      xxxxxxxxxx
      
      56
       
      1
      def accio_grammar(fname, src, samples, cache=True):
      
      2
          hash_id = hashlib.md5(json.dumps(samples).encode()).hexdigest()
      
      3
          cache_file = "build/%s_%s_generalized_tree.json" % (fname, hash_id)
      
      4
          if os.path.exists(cache_file) and cache:
      
      5
              with open(cache_file) as f:
      
      6
                  generalized_tree = json.load(f)
      
      7
          else:
      
      8
          # regenerate the program
      
      9
              program_src[fname] = src
      
      10
              with open('subjects/%s' % fname, 'w+') as f:
      
      11
                  print(src, file=f) 
      
      12
              resrc = rewrite(src, fname)
      
      13
              with open('build/%s' % fname, 'w+') as f:
      
      14
                  print(resrc, file=f)
      
      15
              os.makedirs('samples/%s' % fname, exist_ok=True)
      
      16
              sample_files = {("samples/%s/%d.csv"%(fname,i)):s for i,s in enumerate(samples)}
      
      17
              for k in sample_files:
      
      18
                  with open(k, 'w+') as f:
      
      19
                      print(sample_files[k], file=f)
      
      20
      ​
      
      21
              call_trace = []
      
      22
              for i in sample_files:
      
      23
                  thash_id = hashlib.md5(json.dumps(sample_files[i]).encode()).hexdigest()
      
      24
                  trace_cache_file = "build/%s_%s_trace.json" % (fname, thash_id)
      
      25
                  if os.path.exists(trace_cache_file) and cache:
      
      26
                      with open(trace_cache_file) as f:
      
      27
                          my_tree = f.read()
      
      28
                  else:
      
      29
                      my_tree = do(["python", "./build/%s" % fname, i]).stdout
      
      30
                      with open(trace_cache_file, 'w+') as f:
      
      31
                          print(my_tree, file=f)
      
      32
                  call_trace.append(json.loads(my_tree)[0])
      
      33
      ​
      
      34
              mined_tree = miner(call_trace)
      
      35
      ​
      
      36
              generalized_tree = generalize_iter(mined_tree)
      
      37
              # costly data structure.
      
      38
              with open(cache_file, 'w+') as f:
      
      39
                  json.dump(generalized_tree, f)
      
      40
          g = convert_to_grammar(generalized_tree)
      
      41
          with open('build/%s_grammar_1.json' % fname, 'w+') as f:
      
      42
              json.dump(g, f)
      
      43
          g = check_empty_rules(g)
      
      44
          with open('build/%s_grammar_2.json' % fname, 'w+') as f:
      
      45
              json.dump(g, f)
      
      46
          g = collapse_rules(g) # <- regex learner
      
      47
          with open('build/%s_grammar_3.json' % fname, 'w+') as f:
      
      48
              json.dump(g, f)
      
      49
          g = convert_spaces(g)
      
      50
          with open('build/%s_grammar_4.json' % fname, 'w+') as f:
      
      51
              json.dump(g, f)
      
      52
          e = remove_single_alts(cleanup_tokens(remove_duplicate_rule_keys(remove_single_entries(g))))
      
      53
          e = show_grammar(e, canonical=False)
      
      54
          with open('build/%s_grammar.json' % fname, 'w+') as f:
      
      55
              json.dump(e, f)
      
      56
          return e
      
      executed in 18ms, finished 04:52:02 2019-08-15
      . . .
      In [212]:
      xxxxxxxxxx
      
      1
       
      1
      %top  calc_grammar = accio_grammar('calculator.py', VARS['calc_src'], ['(1+2)-2', '11'])
      
      executed in 711ms, finished 04:52:03 2019-08-15
      . . .
      In [213]:
      xxxxxxxxxx
      
      1
       
      1
      %top calc_grammar
      
      executed in 10ms, finished 04:52:03 2019-08-15
      Out[213]:
      {'<START>': ['<parse_expr-1>'],
       '<parse_expr-1>': ['<parse_expr-3>',
        '<parse_expr-3><parse_expr><parse_expr-3>'],
       '<parse_expr-3>': ['(<parse_expr-1>)', '<is_digit-0-c>'],
       '<parse_expr>': ['+', '-'],
       '<is_digit-0-c>': ['1', '2']}
      
      . . .
      In [214]:
      xxxxxxxxxx
      
      1
       
      1
      %top gf = GrammarFuzzer.GrammarFuzzer(calc_grammar, start_symbol='<START>')
      
      executed in 6ms, finished 04:52:03 2019-08-15
      . . .
      In [215]:
      xxxxxxxxxx
      
      5
       
      1
      %%top
      
      2
      # [(
      
      3
      for i in range(10):
      
      4
          print(gf.fuzz())
      
      5
      # )]
      
      executed in 93ms, finished 04:52:03 2019-08-15
      (2)
      2+(2)
      ((2-((1)))-1)-((2)-(2))
      (1)+(((1))-(1+(((2-1))-((1-1)-(((2-(2-1))+2)+(((1-((2-1)))+1)-2))))))
      1+1
      2
      (((1+(((1-1))-(((1+2)+2))))))-1
      1
      1-1
      (2+((1)))
      
      . . .
      3
       
      1
      ## Libraries
      
      2
      ​
      
      3
      We need a few instrumented supporting libraries.
      

      1.10  Libraries¶

      We need a few instrumented supporting libraries.

      1
       
      1
      ### StringIO replacement
      

      1.10.1  StringIO replacement¶

      In [216]:
      xxxxxxxxxx
      
      275
       
      1
      %%var myio_src↔​
      
      executed in 9ms, finished 04:52:03 2019-08-15
      . . .
      In [217]:
      xxxxxxxxxx
      
      4
       
      1
      # [(
      
      2
      with open('build/myio.py', 'w+') as f:
      
      3
          print(VARS['myio_src'], file=f)
      
      4
      # )]
      
      executed in 8ms, finished 04:52:03 2019-08-15
      . . .
      1
       
      1
      ### ShLex Replacement
      

      1.10.2  ShLex Replacement¶

      In [218]:
      330
       
      1
      %%var mylex_src↔​
      
      executed in 12ms, finished 04:52:03 2019-08-15
      . . .
      In [219]:
      xxxxxxxxxx
      
      4
       
      1
      # [(
      
      2
      with open('build/mylex.py', 'w+') as f:
      
      3
          print(VARS['mylex_src'], file=f)
      
      4
      # )]
      
      executed in 7ms, finished 04:52:03 2019-08-15
      . . .
      1
       
      1
      # Evaluation
      

      2  Evaluation¶

      In [220]:
      xxxxxxxxxx
      
      1
       
      1
      import fuzzingbook
      
      executed in 5ms, finished 04:52:03 2019-08-15
      . . .
      In [221]:
      xxxxxxxxxx
      
      1
       
      1
      assert os.path.isfile('json.tar.gz') # for microjson validation
      
      executed in 6ms, finished 04:52:03 2019-08-15
      . . .
      1
       
      1
      ## Initialization
      

      2.1  Initialization¶

      In [222]:
      xxxxxxxxxx
      
      12
       
      1
      Max_Precision = 1000
      
      2
      Max_Recall = 1000
      
      3
      Autogram = {}
      
      4
      AutogramFuzz = {}
      
      5
      AutogramGrammar = {}
      
      6
      Mimid = {}
      
      7
      MimidFuzz = {}
      
      8
      MimidGrammar = {}
      
      9
      MaxTimeout = 60*60 # 60 minutes
      
      10
      MaxParseTimeout = 60*5
      
      11
      CHECK = {'cgidecode','calculator', 'mathexpr', 'urlparse', 'netrc', 'microjson'}
      
      12
      reset_generalizer()
      
      executed in 8ms, finished 04:52:03 2019-08-15
      . . .
      In [223]:
      xxxxxxxxxx
      
      86
       
      1
      def recover_grammar_with_taints(name, src, samples):
      
      2
          header = '''
      
      3
      import fuzzingbook.GrammarMiner
      
      4
      from fuzzingbook.GrammarMiner import Tracer
      
      5
      from fuzzingbook.InformationFlow import ostr
      
      6
      from fuzzingbook.GrammarMiner import TaintedScopedGrammarMiner as TSGM
      
      7
      from fuzzingbook.GrammarMiner import readable
      
      8
      ​
      
      9
      import subjects.autogram_%s
      
      10
      import fuzzingbook
      
      11
      ​
      
      12
      class ostr_new(ostr):
      
      13
          def __new__(cls, value, *args, **kw):
      
      14
              return str.__new__(cls, value)
      
      15
      ​
      
      16
          def __init__(self, value, taint=None, origin=None, **kwargs):
      
      17
              self.taint = taint
      
      18
      ​
      
      19
              if origin is None:
      
      20
                  origin = ostr.DEFAULT_ORIGIN
      
      21
              if isinstance(origin, int):
      
      22
                  self.origin = list(range(origin, origin + len(self)))
      
      23
              else:
      
      24
                  self.origin = origin
      
      25
              #assert len(self.origin) == len(self) <-- bug fix here.
      
      26
      ​
      
      27
      class ostr_new(ostr_new):
      
      28
          def create(self, res, origin=None):
      
      29
              return ostr_new(res, taint=self.taint, origin=origin)
      
      30
      ​
      
      31
          def __repr__(self):
      
      32
              # bugfix here.
      
      33
              return str.__repr__(self)
      
      34
       
      
      35
      def recover_grammar_with_taints(fn, inputs, **kwargs):
      
      36
          miner = TSGM()
      
      37
          for inputstr in inputs:
      
      38
              with Tracer(ostr_new(inputstr), **kwargs) as tracer:
      
      39
                  fn(tracer.my_input)
      
      40
              miner.update_grammar(tracer.my_input, tracer.trace)
      
      41
          return readable(miner.clean_grammar())
      
      42
      ​
      
      43
      def replaceAngular(grammar):
      
      44
          # special handling for Autogram because it does not look for <> and </>
      
      45
          # in rules, which messes up with parsing.
      
      46
          new_g = {}
      
      47
          replaced = False
      
      48
          for k in grammar:
      
      49
              new_rules = []
      
      50
              for rule in grammar[k]:
      
      51
                  new_rule = rule.replace('<>', '<openA><closeA>').replace('</>', '<openA>/<closeA>').replace('<lambda>','<openA>lambda<closeA>')
      
      52
                  if rule != new_rule:
      
      53
                      replaced = True
      
      54
                  new_rules.append(new_rule)
      
      55
              new_g[k] = new_rules
      
      56
          if replaced:
      
      57
              new_g['<openA>'] = ['<']
      
      58
              new_g['<closeA>'] = ['<']
      
      59
          return new_g
      
      60
      def replace_start(grammar):
      
      61
          assert '<start>' in grammar
      
      62
          start = grammar['<start>']
      
      63
          del grammar['<start>']
      
      64
          grammar['<START>'] = start
      
      65
          return replaceAngular(grammar)
      
      66
          
      
      67
      samples = [i.strip() for i in [
      
      68
      %s
      
      69
      ] if i.strip()]
      
      70
      import json
      
      71
      autogram_grammar_t = recover_grammar_with_taints(subjects.autogram_%s.main, samples)
      
      72
      print(json.dumps(replace_start(autogram_grammar_t)))
      
      73
      '''
      
      74
          mod_name = name.replace('.py','')
      
      75
          with open('./subjects/autogram_%s' % name, 'w+') as f:
      
      76
              print(src, file=f)
      
      77
       
      
      78
          with open('./build/autogram_%s' % name, 'w+') as f:
      
      79
              print(header % (mod_name, ',\n'.join([repr(i) for i in samples]), mod_name), file=f)
      
      80
        
      
      81
          with ExpectTimeout(MaxTimeout):
      
      82
              result = do(["python","./build/autogram_%s" % name], env={'PYTHONPATH':'.'}, log=True)
      
      83
              if result.stderr.strip():
      
      84
                  print(result.stderr, file=sys.stderr)
      
      85
              return show_grammar(json.loads(result.stdout), canonical=False)
      
      86
          return {}
      
      executed in 14ms, finished 04:52:03 2019-08-15
      . . .
      In [224]:
      xxxxxxxxxx
      
      1
       
      1
      from fuzzingbook.Parser import IterativeEarleyParser
      
      executed in 8ms, finished 04:52:03 2019-08-15
      . . .
      3
       
      1
      ### Check Recall
      
      2
      ​
      
      3
      How many of the *valid* inputs from the golden grammar can be recognized by a parser using our grammar?
      

      2.1.1  Check Recall¶

      How many of the valid inputs from the golden grammar can be recognized by a parser using our grammar?

      In [225]:
      xxxxxxxxxx
      
      23
       
      1
      def check_recall(golden_grammar, my_grammar, validator, maximum=Max_Recall, log=False):
      
      2
          my_count = maximum
      
      3
          ie = IterativeEarleyParser(my_grammar, start_symbol='<START>')
      
      4
          golden = GrammarFuzzer.GrammarFuzzer(golden_grammar, start_symbol='<START>')
      
      5
          success = 0
      
      6
          while my_count != 0:
      
      7
              src = golden.fuzz()
      
      8
              try:
      
      9
                  validator(src)
      
      10
                  my_count -= 1
      
      11
                  try:
      
      12
                      #print('?', repr(src), file=sys.stderr)
      
      13
                      for tree in ie.parse(src):
      
      14
                          success += 1
      
      15
                          break
      
      16
                      if log: print(maximum - my_count, '+', repr(src), success, file=sys.stderr)
      
      17
                  except:
      
      18
                      #print("Error:", sys.exc_info()[0], file=sys.stderr)
      
      19
                      if log: print(maximum - my_count, '-', repr(src), file=sys.stderr)
      
      20
                      pass
      
      21
              except:
      
      22
                  pass
      
      23
          return (success, maximum)
      
      executed in 13ms, finished 04:52:03 2019-08-15
      . . .
      2
       
      1
      ### Check Precision
      
      2
      How many of the inputs produced using our grammar are valid? (Accepted by the program).
      

      2.1.2  Check Precision¶

      How many of the inputs produced using our grammar are valid? (Accepted by the program).

      In [226]:
      xxxxxxxxxx
      
      10
       
      1
      def check_precision(name, grammar, maximum=Max_Precision, log=False):
      
      2
          success = 0
      
      3
          with ExpectError():
      
      4
              fuzzer = GrammarFuzzer.GrammarFuzzer(grammar, start_symbol='<START>')
      
      5
              for i in range(maximum):
      
      6
                  v = fuzzer.fuzz()
      
      7
                  c = check(v, name)
      
      8
                  success += (1 if c else 0)
      
      9
                  if log: print(i, repr(v), c)
      
      10
          return (success, maximum)
      
      executed in 10ms, finished 04:52:03 2019-08-15
      . . .
      1
       
      1
      ### Timer
      

      2.1.3  Timer¶

      In [227]:
      xxxxxxxxxx
      
      1
       
      1
      from datetime import datetime
      
      executed in 12ms, finished 04:52:03 2019-08-15
      . . .
      In [228]:
      xxxxxxxxxx
      
      7
       
      1
      class timeit():
      
      2
          def __enter__(self):
      
      3
              self.tic = datetime.now()
      
      4
              return self
      
      5
          def __exit__(self, *args, **kwargs):
      
      6
              self.delta = datetime.now() - self.tic
      
      7
              self.runtime = (self.delta.microseconds, self.delta)
      
      executed in 9ms, finished 04:52:03 2019-08-15
      . . .
      In [229]:
      xxxxxxxxxx
      
      1
       
      1
      from fuzzingbook.ExpectError import ExpectError, ExpectTimeout
      
      executed in 4ms, finished 04:52:03 2019-08-15
      . . .
      In [230]:
      xxxxxxxxxx
      
      1
       
      1
      from fuzzingbook.Parser import IterativeEarleyParser
      
      executed in 5ms, finished 04:52:03 2019-08-15
      . . .
      In [231]:
      xxxxxxxxxx
      
      3
       
      1
      def process(s):
      
      2
          # see the rewrite fn. We remove newlines from grammar training to make it easier to visualize
      
      3
          return s.strip().replace('\n', ' ')
      
      executed in 5ms, finished 04:52:03 2019-08-15
      . . .
      In [232]:
      xxxxxxxxxx
      
      10
       
      1
      def check_parse(grammar, inputstrs, start_symbol='<START>'):
      
      2
          count = 0
      
      3
          e = IterativeEarleyParser(grammar, start_symbol=start_symbol)
      
      4
          for s in inputstrs:
      
      5
              with ExpectError():
      
      6
                  with ExpectTimeout(MaxParseTimeout):
      
      7
                      for tree in e.parse(process(s)):
      
      8
                          count += 1
      
      9
                          break
      
      10
          return (count, len(inputstrs))
      
      executed in 6ms, finished 04:52:03 2019-08-15
      . . .
      In [233]:
      xxxxxxxxxx
      
      1
       
      1
      from fuzzingbook.ExpectError import ExpectError, ExpectTimeout
      
      executed in 6ms, finished 04:52:03 2019-08-15
      . . .
      In [234]:
      xxxxxxxxxx
      
      1
       
      1
      from fuzzingbook import GrammarFuzzer, Parser
      
      executed in 6ms, finished 04:52:03 2019-08-15
      . . .
      In [235]:
      xxxxxxxxxx
      
      4
       
      1
      def save_grammar(grammar, tool, program):
      
      2
          with open("build/%s-%s.grammar.json" % (tool, program), 'w+') as f:
      
      3
              json.dump(grammar, f)
      
      4
          return {k:sorted(grammar[k]) for k in grammar}
      
      executed in 7ms, finished 04:52:03 2019-08-15
      . . .
      In [236]:
      xxxxxxxxxx
      
      1
       
      1
      import string
      
      executed in 5ms, finished 04:52:03 2019-08-15
      . . .
      1
       
      1
      ## Subjects
      

      2.2  Subjects¶

      In [237]:
      xxxxxxxxxx
      
      15
       
      1
      Mimid_p = {}
      
      2
      Mimid_r = {}
      
      3
      Autogram_p = {}
      
      4
      Autogram_r = {}
      
      5
      ​
      
      6
      Mimid_t ={}
      
      7
      Autogram_t ={}
      
      8
      ​
      
      9
      for k in program_src:
      
      10
          Mimid_p[k] = None
      
      11
          Mimid_r[k] = None
      
      12
          Mimid_t[k] = None
      
      13
          Autogram_p[k] = None
      
      14
          Autogram_r[k] = None
      
      15
          Autogram_t[k] = None
      
      executed in 9ms, finished 04:52:03 2019-08-15
      . . .
      1
       
      1
      ### CGIDecode
      

      2.2.1  CGIDecode¶

      1
       
      1
      #### Golden Grammar
      

      2.2.1.1  Golden Grammar¶

      In [238]:
      xxxxxxxxxx
      
      1
       
      1
      import urllib.parse
      
      executed in 5ms, finished 04:52:03 2019-08-15
      . . .
      In [239]:
      xxxxxxxxxx
      
      14
       
      1
      cgidecode_golden = {
      
      2
        "<START>": [
      
      3
          "<cgidecode-s>"
      
      4
        ],
      
      5
        "<cgidecode-s>": [
      
      6
            '<cgidecode>',
      
      7
            '<cgidecode><cgidecode-s>'],
      
      8
        "<cgidecode>": [
      
      9
          "<single_char>",
      
      10
          "<percentage_char>"
      
      11
        ],
      
      12
        "<single_char>": list(string.ascii_lowercase + string.ascii_uppercase + string.digits + "-./_~"),
      
      13
        "<percentage_char>": [urllib.parse.quote(i) for i in string.punctuation if i not in  "-./_~"],
      
      14
      }
      
      executed in 14ms, finished 04:52:03 2019-08-15
      . . .
      1
       
      1
      #### Samples
      

      2.2.1.2  Samples¶

      In [240]:
      xxxxxxxxxx
      
      19
       
      1
      cgidecode_samples = [↔​
      
      19
      ]
      
      executed in 7ms, finished 04:52:03 2019-08-15
      . . .
      In [241]:
      xxxxxxxxxx
      
      3
       
      1
      with timeit() as t:
      
      2
          cgidecode_grammar = accio_grammar('cgidecode.py', VARS['cgidecode_src'], cgidecode_samples)
      
      3
      Mimid_t['cgidecode.py'] = t.runtime
      
      executed in 31.8s, finished 04:52:35 2019-08-15
      . . .
      1
       
      1
      #### Mimid
      

      2.2.1.3  Mimid¶

      In [242]:
      xxxxxxxxxx
      
      1
       
      1
      save_grammar(cgidecode_grammar, 'mimid', 'cgidecode')
      
      executed in 9ms, finished 04:52:35 2019-08-15
      Out[242]:
      {'<START>': ['<cgi_decode-1-s>'],
       '<cgi_decode-1-s>': ['<cgi_decode-1>', '<cgi_decode-1><cgi_decode-1-s>'],
       '<cgi_decode-1>': ['%<cgi_decode>',
        '&',
        '+',
        '-',
        '.',
        '/',
        '0',
        '1',
        '2',
        '3',
        '4',
        '5',
        '6',
        '7',
        '8',
        '9',
        ':',
        '=',
        '?',
        'A',
        'B',
        'C',
        'D',
        'E',
        'F',
        'G',
        'H',
        'I',
        'J',
        'K',
        'L',
        'M',
        'N',
        'O',
        'P',
        'Q',
        'R',
        'S',
        'T',
        'U',
        'V',
        'W',
        'X',
        'Y',
        'Z',
        '_',
        'a',
        'b',
        'c',
        'd',
        'e',
        'f',
        'g',
        'h',
        'i',
        'j',
        'k',
        'l',
        'm',
        'n',
        'o',
        'p',
        'q',
        'r',
        's',
        't',
        'u',
        'v',
        'w',
        'x',
        'y',
        'z',
        '~'],
       '<cgi_decode>': ['00',
        '20',
        '21',
        '22',
        '23',
        '24',
        '25',
        '26',
        '27',
        '28',
        '29',
        '2A',
        '2B',
        '2C',
        '2D',
        '2E',
        '2F',
        '2a',
        '2b',
        '2c',
        '2d',
        '2e',
        '2f',
        '3A',
        '3B',
        '3C',
        '3D',
        '3E',
        '3F',
        '3a',
        '3b',
        '3c',
        '3d',
        '3e',
        '3f',
        '40',
        '5B',
        '5C',
        '5D',
        '5E',
        '5F',
        '5b',
        '5c',
        '5d',
        '5e',
        '5f',
        '60',
        '7B',
        '7C',
        '7D',
        '7E',
        '7b',
        '7c',
        '7d',
        '7e']}
      
      . . .
      In [243]:
      xxxxxxxxxx
      
      4
       
      1
      if 'cgidecode' in CHECK:
      
      2
          result = check_precision('cgidecode.py', cgidecode_grammar)
      
      3
          Mimid_p['cgidecode.py'] = result
      
      4
          print(result)
      
      executed in 28.7s, finished 04:53:04 2019-08-15
      (1000, 1000)
      
      . . .
      In [244]:
      xxxxxxxxxx
      
      1
       
      1
      import subjects.cgidecode
      
      executed in 8ms, finished 04:53:04 2019-08-15
      . . .
      In [245]:
      xxxxxxxxxx
      
      4
       
      1
      if 'cgidecode' in CHECK:
      
      2
          result = check_recall(cgidecode_golden, cgidecode_grammar, subjects.cgidecode.main)
      
      3
          Mimid_r['cgidecode.py'] = result
      
      4
          print(result)
      
      executed in 6.40s, finished 04:53:10 2019-08-15
      (1000, 1000)
      
      . . .
      1
       
      1
      #### Autogram
      

      2.2.1.4  Autogram¶

      In [246]:
      xxxxxxxxxx
      
      4
       
      1
      %%time
      
      2
      with timeit() as t:
      
      3
          autogram_cgidecode_grammar_t = recover_grammar_with_taints('cgidecode.py', VARS['cgidecode_src'], cgidecode_samples)
      
      4
      Autogram_t['cgidecode.py'] = t.runtime
      
      executed in 24.9s, finished 04:53:35 2019-08-15
      CPU times: user 11.8 ms, sys: 8.75 ms, total: 20.6 ms
      Wall time: 24.9 s
      
      . . .
      In [247]:
      xxxxxxxxxx
      
      1
       
      1
      save_grammar(autogram_cgidecode_grammar_t, 'autogram_t', 'cgidecode')
      
      executed in 13ms, finished 04:53:35 2019-08-15
      Out[247]:
      {'<START>': ['<create@27:self>'],
       '<create@27:self>': ['-',
        '1',
        '<__init__@15:self>',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>+<cgi_decode@19:c><cgi_decode@19:c>+me<cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>zin<cgi_decode@19:c><cgi_decode@19:c>oo<cgi_decode@19:c><cgi_decode@19:c>o<cgi_decode@19:c>g',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>e<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>a<cgi_decode@19:c>s=<cgi_decode@19:c><cgi_decode@19:c>1&ma<cgi_decode@19:c><cgi_decode@19:c>=<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>2<cgi_decode@23:digit_low>+2+%<cgi_decode@23:digit_high><cgi_decode@23:digit_low>+<cgi_decode@19:c>&',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>e<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>at<cgi_decode@19:c>s=<cgi_decode@19:c><cgi_decode@19:c>od&status=<cgi_decode@19:c>a<cgi_decode@19:c>p<cgi_decode@19:c>&',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c>l<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>o<cgi_decode@19:c>l<cgi_decode@19:c>%2<cgi_decode@23:digit_low>',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@19:c>o<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low>%<cgi_decode@23:digit_high><cgi_decode@23:digit_low>%20<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>%20%23%20<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>en<cgi_decode@19:c>%20%2<cgi_decode@23:digit_low>',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%22<cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c>J%2<cgi_decode@23:digit_low><cgi_decode@19:c>B%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c>',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c>N%2<cgi_decode@23:digit_low><cgi_decode@19:c>V%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low>Ae%2<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%2<cgi_decode@23:digit_low><cgi_decode@19:c>f%2EB',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%<cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%<cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c>h%5<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%5<cgi_decode@23:digit_low><cgi_decode@19:c>D%5<cgi_decode@23:digit_low>DR%5<cgi_decode@23:digit_low>c<cgi_decode@19:c>%5<cgi_decode@23:digit_low><cgi_decode@19:c>',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%5<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%5<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%5<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%<cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%<cgi_decode@23:digit_high>b<cgi_decode@19:c><cgi_decode@19:c>%7<cgi_decode@23:digit_low>h<cgi_decode@19:c>%7<cgi_decode@23:digit_low>mB%7e<cgi_decode@19:c>c%7B<cgi_decode@19:c>',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%7<cgi_decode@23:digit_low>C<cgi_decode@19:c>%7<cgi_decode@23:digit_low><cgi_decode@19:c>',
        '<__init__@15:self><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%<cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c>F%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3<cgi_decode@23:digit_low><cgi_decode@19:c><cgi_decode@19:c>%3Ay<cgi_decode@19:c>%3B<cgi_decode@19:c>q%3C<cgi_decode@19:c>',
        '<__init__@15:self><cgi_decode@19:c>t<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>/t<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>t/<cgi_decode@19:c><cgi_decode@19:c>g<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>a<cgi_decode@19:c>p<cgi_decode@19:c><cgi_decode@19:c>seri<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>ob<cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low>%<cgi_decode@23:digit_high>b%2<cgi_decode@23:digit_low>update%20logintable%20set%20pass<cgi_decode@19:c>d%3d%270wn3d%27%3b<cgi_decode@19:c>-%00',
        '<__init__@15:self><cgi_decode@19:c>t<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>/t<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>t/get<cgi_decode@19:c>ata<cgi_decode@19:c>php<cgi_decode@19:c>data<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c>cr<cgi_decode@19:c>pt%<cgi_decode@23:digit_high><cgi_decode@23:digit_low>src=%22<cgi_decode@31:t>tp%3a%2<cgi_decode@23:digit_low>%2f',
        '<__init__@15:self><cgi_decode@31:t><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>a<cgi_decode@19:c><cgi_decode@19:c>.c<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c><cgi_decode@23:digit_high><cgi_decode@23:digit_low><cgi_decode@19:c>a<cgi_decode@19:c><cgi_decode@19:c><cgi_decode@19:c>.<cgi_decode@19:c>s%22%<cgi_decode@23:digit_high>e%3c%2fsc<cgi_decode@19:c><cgi_decode@19:c>pt%3e',
        '<cgi_decode@19:c>',
        '<cgi_decode@23:digit_low>',
        'C',
        'H',
        'O',
        'S',
        'W',
        'a',
        'h',
        'n',
        'w',
        'y'],
       '<__init__@15:self>': ['<cgi_decode@19:c>', '<cgi_decode@31:t>'],
       '<cgi_decode@19:c>': ['%', '+', '<__add__@1115:other>', '<create@27:self>'],
       '<cgi_decode@23:digit_high>': ['2',
        '3',
        '4',
        '5',
        '6',
        '7',
        '<cgi_decode@19:c>',
        '<cgi_decode@23:digit_low>'],
       '<cgi_decode@23:digit_low>': ['0',
        '1',
        '2',
        '3',
        '4',
        '5',
        '6',
        '7',
        '8',
        '9',
        '<cgi_decode@19:c>',
        '<cgi_decode@23:digit_high>',
        'A',
        'B',
        'C',
        'D',
        'E',
        'F',
        'a',
        'b',
        'c',
        'd',
        'e',
        'f'],
       '<cgi_decode@31:t>': ['<__add__@1115:other>',
        '<__add__@1115:other>w',
        '<__add__@1115:self>',
        'h<__add__@1115:other>'],
       '<__add__@1115:other>': ['&',
        '-',
        '.',
        '/',
        '0',
        '1',
        '2',
        '3',
        '4',
        '5',
        '6',
        '7',
        '8',
        '9',
        ':',
        '<__add__@1115:self>',
        '<cgi_decode@19:c>',
        '<cgi_decode@23:digit_high>',
        '<cgi_decode@23:digit_low>',
        '=',
        '?',
        'A',
        'B',
        'C',
        'D',
        'E',
        'F',
        'G',
        'H',
        'I',
        'J',
        'K',
        'L',
        'M',
        'N',
        'O',
        'P',
        'Q',
        'R',
        'S',
        'T',
        'U',
        'V',
        'W',
        'X',
        'Y',
        'Z',
        '_',
        'a',
        'b',
        'c',
        'd',
        'e',
        'f',
        'g',
        'h',
        'i',
        'j',
        'k',
        'l',
        'm',
        'n',
        'o',
        'p',
        'q',
        'r',
        's',
        't',
        'u',
        'v',
        'w',
        'x',
        'y',
        'z',
        '~'],
       '<__add__@1115:self>': ['<__add__@1115:other>', '<create@27:self>']}
      
      . . .
      In [248]:
      xxxxxxxxxx
      
      4
       
      1
      if 'cgidecode' in CHECK:
      
      2
          result = check_precision('cgidecode.py', autogram_cgidecode_grammar_t)
      
      3
          Autogram_p['cgidecode.py'] = result
      
      4
          print(result)
      
      executed in 1m 9.15s, finished 04:54:44 2019-08-15
      (460, 1000)
      
      . . .
      In [249]:
      xxxxxxxxxx
      
      4
       
      1
      if 'cgidecode' in CHECK:
      
      2
          result = check_recall(cgidecode_golden, autogram_cgidecode_grammar_t, subjects.cgidecode.main)
      
      3
          Autogram_r['cgidecode.py'] = result
      
      4
          print(result)
      
      executed in 1m 4.02s, finished 04:55:48 2019-08-15
      (380, 1000)
      
      . . .
      1
       
      1
      ### Calculator
      

      2.2.2  Calculator¶

      1
       
      1
      #### Golden Grammar
      

      2.2.2.1  Golden Grammar¶

      In [250]:
      xxxxxxxxxx
      
      28
       
      1
      calc_golden = {
      
      2
        "<START>": [
      
      3
          "<expr>"
      
      4
        ],
      
      5
        "<expr>": [
      
      6
          "<term>+<expr>",
      
      7
          "<term>-<expr>",
      
      8
          "<term>"
      
      9
        ],
      
      10
        "<term>": [
      
      11
          "<factor>*<term>",
      
      12
          "<factor>/<term>",
      
      13
          "<factor>"
      
      14
        ],
      
      15
        "<factor>": [
      
      16
          "(<expr>)",
      
      17
          "<number>"
      
      18
        ],
      
      19
        "<number>": [
      
      20
          "<integer>.<integer>",
      
      21
          "<integer>"
      
      22
        ],
      
      23
        "<integer>": [
      
      24
          "<digit><integer>",
      
      25
          "<digit>"
      
      26
        ],
      
      27
        "<digit>": [ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9" ]
      
      28
      }
      
      executed in 7ms, finished 04:55:48 2019-08-15
      . . .
      1
       
      1
      #### Samples
      

      2.2.2.2  Samples¶

      In [251]:
      xxxxxxxxxx
      
      15
       
      1
      calc_samples=[i.strip() for i in '''\
      
      2
      (1+2)*3/(423-334+9983)-5-((6)-(701))
      
      3
      (123+133*(12-3)/9+8)+33
      
      4
      (100)
      
      5
      21*3
      
      6
      33/44+2
      
      7
      100
      
      8
      23*234*22*4
      
      9
      (123+133*(12-3)/9+8)+33
      
      10
      1+2
      
      11
      31/20-2
      
      12
      555+(234-445)
      
      13
      1-(41/2)
      
      14
      443-334+33-222
      
      15
      '''.split('\n') if i.strip()]
      
      executed in 7ms, finished 04:55:48 2019-08-15
      . . .
      In [252]:
      xxxxxxxxxx
      
      4
       
      1
      %%time
      
      2
      with timeit() as t:
      
      3
          calc_grammar = accio_grammar('calculator.py', VARS['calc_src'], calc_samples)
      
      4
      Mimid_t['calculator.py'] = t.runtime
      
      executed in 6.73s, finished 04:55:55 2019-08-15
      CPU times: user 332 ms, sys: 387 ms, total: 719 ms
      Wall time: 6.72 s
      
      . . .
      1
       
      1
      #### Mimid
      

      2.2.2.3  Mimid¶

      In [253]:
      xxxxxxxxxx
      
      1
       
      1
      save_grammar(calc_grammar, 'mimid', 'calculator')
      
      executed in 9ms, finished 04:55:55 2019-08-15
      Out[253]:
      {'<START>': ['<parse_expr-0-c>'],
       '<parse_expr-0-c>': ['<parse_expr-1>', '<parse_expr-2-s><parse_expr-1>'],
       '<parse_expr-1>': ['(<parse_expr-0-c>)', '<parse_num-1-s>'],
       '<parse_expr-2-s>': ['<parse_expr-1><parse_expr>',
        '<parse_expr-1><parse_expr><parse_expr-2-s>'],
       '<parse_num-1-s>': ['<is_digit-0-c>', '<is_digit-0-c><parse_num-1-s>'],
       '<is_digit-0-c>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
       '<parse_expr>': ['*', '+', '-', '/']}
      
      . . .
      In [254]:
      xxxxxxxxxx
      
      4
       
      1
      if 'calculator' in CHECK:
      
      2
          result = check_precision('calculator.py', calc_grammar)
      
      3
          Mimid_p['calculator.py'] = result
      
      4
          print(result)
      
      executed in 36.9s, finished 04:56:32 2019-08-15
      (1000, 1000)
      
      . . .
      In [255]:
      xxxxxxxxxx
      
      1
       
      1
      import subjects.calculator
      
      executed in 8ms, finished 04:56:32 2019-08-15
      . . .
      In [256]:
      xxxxxxxxxx
      
      4
       
      1
      if 'calculator' in CHECK:
      
      2
          result = check_recall(calc_golden, calc_grammar, subjects.calculator.main)
      
      3
          Mimid_r['calculator.py'] = result
      
      4
          print(result)
      
      executed in 33.4s, finished 04:57:05 2019-08-15
      (1000, 1000)
      
      . . .
      1
       
      1
      #### Autogram
      

      2.2.2.4  Autogram¶

      In [257]:
      xxxxxxxxxx
      
      4
       
      1
      %%time
      
      2
      with timeit() as t:
      
      3
          autogram_calc_grammar_t = recover_grammar_with_taints('calculator.py', VARS['calc_src'], calc_samples)
      
      4
      Autogram_t['calculator.py'] = t.runtime
      
      executed in 6.56s, finished 04:57:12 2019-08-15
      CPU times: user 9.24 ms, sys: 6.19 ms, total: 15.4 ms
      Wall time: 6.55 s
      
      . . .
      In [258]:
      xxxxxxxxxx
      
      1
       
      1
      save_grammar(autogram_calc_grammar_t, 'autogram_t', 'calculator')
      
      executed in 17ms, finished 04:57:12 2019-08-15
      Out[258]:
      {'<START>': ['<__init__@15:self>'],
       '<__init__@15:self>': ['<parse_expr@26:c>00',
        '<parse_expr@26:c>1+2)<parse_expr@26:c><parse_expr@26:c><parse_expr@26:c>(423<parse_expr@26:c>334+9983)-<parse_expr@26:c>-((6)-(701))',
        '<parse_expr@26:c>100)',
        '<parse_expr@26:c>12<parse_expr@26:c><parse_expr@26:c>1<parse_expr@29:num>*(12-3)/9+8)+33',
        '<parse_expr@26:c>1<parse_expr@26:c><parse_expr@26:c>',
        '<parse_expr@26:c>1<parse_expr@26:c><parse_expr@26:c>0<parse_expr@26:c>2',
        '<parse_expr@26:c>3<parse_expr@26:c><parse_expr@26:c>4<parse_expr@26:c><parse_expr@26:c>',
        '<parse_expr@26:c>3<parse_expr@26:c><parse_expr@29:num><parse_expr@26:c>*<parse_expr@29:num>*4',
        '<parse_expr@26:c>4<parse_expr@26:c><parse_expr@26:c><parse_expr@29:num><parse_expr@26:c>33-<parse_expr@26:c>22',
        '<parse_expr@26:c>55<parse_expr@26:c><parse_expr@26:c>234-445)',
        '<parse_expr@26:c><parse_expr@26:c><parse_expr@26:c>',
        '<parse_expr@26:c><parse_expr@26:c><parse_expr@26:c>41/2)'],
       '<parse_expr@26:c>': ['(',
        '*',
        '+',
        '-',
        '/',
        '1',
        '2',
        '3',
        '4',
        '5',
        '<parse_expr@29:num>'],
       '<parse_expr@29:num>': ['1', '2', '22', '23', '3', '33', '4', '5']}
      
      . . .
      In [259]:
      xxxxxxxxxx
      
      4
       
      1
      if 'calculator' in CHECK:
      
      2
          result = check_precision('calculator.py', autogram_calc_grammar_t)
      
      3
          Autogram_p['calculator.py'] = result
      
      4
          print(result)
      
      executed in 33.8s, finished 04:57:46 2019-08-15
      (395, 1000)
      
      . . .
      In [260]:
      xxxxxxxxxx
      
      4
       
      1
      if 'calculator' in CHECK:
      
      2
          result = check_recall(calc_golden, autogram_calc_grammar_t, subjects.calculator.main)
      
      3
          Autogram_r['calculator.py'] = result
      
      4
          print(result)
      
      executed in 28.9s, finished 04:58:15 2019-08-15
      (1, 1000)
      
      . . .
      1
       
      1
      ### MathExpr
      

      2.2.3  MathExpr¶

      1
       
      1
      #### Golden Grammar
      

      2.2.3.1  Golden Grammar¶

      In [261]:
      xxxxxxxxxx
      
      65
       
      1
      mathexpr_golden = {
      
      2
        "<START>": [
      
      3
          "<expr>"
      
      4
        ],
      
      5
        "<word>": [
      
      6
          "pi",
      
      7
          "e",
      
      8
          "phi",
      
      9
          "abs",
      
      10
          "acos",
      
      11
          "asin",
      
      12
          "atan",
      
      13
          "atan2",
      
      14
          "ceil",
      
      15
          "cos",
      
      16
          "cosh",
      
      17
          "degrees",
      
      18
          "exp",
      
      19
          "fabs",
      
      20
          "floor",
      
      21
          "fmod",
      
      22
          "frexp",
      
      23
          "hypot",
      
      24
          "ldexp",
      
      25
          "log",
      
      26
          "log10",
      
      27
          "modf",
      
      28
          "pow",
      
      29
          "radians",
      
      30
          "sin",
      
      31
          "sinh",
      
      32
          "sqrt",
      
      33
          "tan",
      
      34
          "tanh",
      
      35
          "<alpha>"
      
      36
        ],
      
      37
        "<alpha>": [ "a", "b", "c", "d", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"],
      
      38
        "<expr>": [
      
      39
          "<term>+<expr>",
      
      40
          "<term>-<expr>",
      
      41
          "<term>"
      
      42
        ],
      
      43
        "<term>": [
      
      44
          "<factor>*<term>",
      
      45
          "<factor>/<term>",
      
      46
          "<factor>"
      
      47
        ],
      
      48
        "<factor>": [
      
      49
          "+<factor>",
      
      50
          "-<factor>",
      
      51
          "(<expr>)",
      
      52
          "<word>(<expr>,<expr>)",
      
      53
          "<word>",
      
      54
          "<number>"
      
      55
        ],
      
      56
        "<number>": [
      
      57
          "<integer>.<integer>",
      
      58
          "<integer>"
      
      59
        ],
      
      60
        "<integer>": [
      
      61
          "<digit><integer>",
      
      62
          "<digit>"
      
      63
        ],
      
      64
        "<digit>": [ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9" ]
      
      65
      }
      
      executed in 10ms, finished 04:58:15 2019-08-15
      . . .
      1
       
      1
      #### Samples
      

      2.2.3.2  Samples¶

      In [262]:
      xxxxxxxxxx
      
      34
       
      1
      mathexpr_samples=[i.strip() for i in '''
      
      2
      (pi*e+2)*3/(423-334+9983)-5-((6)-(701-x))
      
      3
      (123+133*(12-3)/9+8)+33
      
      4
      (100)
      
      5
      pi * e
      
      6
      (1 - 1 + -1) * pi
      
      7
      1.0 / 3 * 6
      
      8
      (x + e * 10) / 10
      
      9
      (a + b) / c
      
      10
      1 + pi / 4
      
      11
      (1-2)/3.0 + 0.0000
      
      12
      -(1 + 2) * 3
      
      13
      (1 + 2) * 3
      
      14
      100
      
      15
      1 + 2 * 3
      
      16
      23*234*22*4
      
      17
      (123+133*(12-3)/9+8)+33
      
      18
      1+2
      
      19
      31/20-2
      
      20
      555+(234-445)
      
      21
      1-(41/2)
      
      22
      443-334+33-222
      
      23
      cos(x+4*3) + 2 * 3
      
      24
      exp(0)
      
      25
      -(1 + 2) * 3
      
      26
      (1-2)/3.0 + 0.0000
      
      27
      abs(-2) + pi / 4
      
      28
      (pi + e * 10) / 10
      
      29
      1.0 / 3 * 6
      
      30
      cos(pi) * 1
      
      31
      atan2(2, 1)
      
      32
      hypot(5, 12)
      
      33
      pow(3, 5)
      
      34
      '''.strip().split('\n') if i.strip()]
      
      executed in 11ms, finished 04:58:15 2019-08-15
      . . .
      1
       
      1
      #### Mimid
      

      2.2.3.3  Mimid¶

      In [263]:
      xxxxxxxxxx
      
      4
       
      1
      %%time
      
      2
      with timeit() as t:
      
      3
          mathexpr_grammar = accio_grammar('mathexpr.py', VARS['mathexpr_src'], mathexpr_samples, cache=False)
      
      4
      Mimid_t['mathexpr.py'] = t.runtime
      
      executed in 17.0s, finished 04:58:32 2019-08-15
      CPU times: user 1.03 s, sys: 922 ms, total: 1.95 s
      Wall time: 17 s
      
      . . .
      In [264]:
      xxxxxxxxxx
      
      1
       
      1
      save_grammar(mathexpr_grammar, 'mimid', 'mathexpr')
      
      executed in 15ms, finished 04:58:32 2019-08-15
      Out[264]:
      {'<START>': ['<parseAddition-1>'],
       '<parseAddition-1>': ['<parseMultiplication-1>',
        '<parseMultiplication-1><parseAddition-2-s>'],
       '<parseMultiplication-1>': ['<parseParenthesis-0-c>',
        '<parseParenthesis-0-c><parseMultiplication-2-s>'],
       '<parseAddition-2-s>': ['<parseAddition>',
        '<parseAddition><parseAddition-2-s>'],
       '<parseParenthesis-0-c>': [' <parseNegative-0-c>',
        '(<parseAddition-1>)',
        '<parseNegative-0-c>'],
       '<parseMultiplication-2-s>': ['<parseMultiplication>',
        '<parseMultiplication><parseMultiplication-2-s>'],
       '<parseNegative-0-c>': ['-<parseParenthesis-0-c>', '<parseValue-0-c>'],
       '<parseValue-0-c>': ['<parseNumber-1-s>', '<parseVariable-0-c>'],
       '<parseNumber-1-s>': ['<parseNumber>', '<parseNumber><parseNumber-1-s>'],
       '<parseVariable-0-c>': ['a',
        'a<parseVariable-9-c>',
        'b',
        'c',
        'co<parseVariable-11>',
        'e',
        'exp<parseArguments-1>',
        'hypot<parseArguments-1>',
        'p<parseVariable-1-c>',
        'x'],
       '<parseNumber>': ['.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
       '<parseVariable-9-c>': ['b<parseVariable-11>', 'tan2<parseArguments-1>'],
       '<parseVariable-11>': ['s<parseArguments-1>'],
       '<parseArguments-1>': ['(<parseAddition-1><parseArguments-2-c>'],
       '<parseVariable-1-c>': ['i', 'ow<parseArguments-1>'],
       '<parseArguments-2-c>': [')', ', <parseAddition-1>)'],
       '<parseMultiplication>': ['<parseMultiplication-2>',
        '<parseMultiplication-3>',
        '<parseMultiplication-5>'],
       '<parseMultiplication-2>': ['*<parseParenthesis-0-c>'],
       '<parseMultiplication-3>': [' ', ' <parseMultiplication-4>'],
       '<parseMultiplication-5>': ['/<parseParenthesis-0-c>'],
       '<parseMultiplication-4>': ['<parseMultiplication-2>',
        '<parseMultiplication-5>'],
       '<parseAddition>': ['+<parseMultiplication-1>', '-<parseMultiplication-1>']}
      
      . . .
      In [265]:
      xxxxxxxxxx
      
      4
       
      1
      if 'mathexpr' in CHECK:
      
      2
          result = check_precision('mathexpr.py', mathexpr_grammar)
      
      3
          Mimid_p['mathexpr.py'] = result
      
      4
          print(result)
      
      executed in 7m 23s, finished 05:05:54 2019-08-15
      (699, 1000)
      
      . . .
      In [266]:
      xxxxxxxxxx
      
      1
       
      1
      import subjects.mathexpr
      
      executed in 9ms, finished 05:05:54 2019-08-15
      . . .
      In [267]:
      xxxxxxxxxx
      
      4
       
      1
      if 'mathexpr' in CHECK:
      
      2
          result = check_recall(mathexpr_golden, mathexpr_grammar, subjects.mathexpr.main)
      
      3
          Mimid_r['mathexpr.py'] = result
      
      4
          print(result)
      
      executed in 6m 54s, finished 05:12:48 2019-08-15
      (922, 1000)
      
      . . .
      1
       
      1
      #### Autogram
      

      2.2.3.4  Autogram¶

      In [268]:
      xxxxxxxxxx
      
      4
       
      1
      %%time
      
      2
      with timeit() as t:
      
      3
          autogram_mathexpr_grammar_t = recover_grammar_with_taints('mathexpr.py',  VARS['mathexpr_src'], mathexpr_samples)
      
      4
      Autogram_t['mathexpr.py'] = t.runtime
      
      executed in 26.5s, finished 05:13:15 2019-08-15
      CPU times: user 11.4 ms, sys: 9.45 ms, total: 20.8 ms
      Wall time: 26.5 s
      
      . . .
      In [269]:
      xxxxxxxxxx
      
      1
       
      1
      save_grammar(autogram_mathexpr_grammar_t, 'autogram_t', 'mathexpr')
      
      executed in 9ms, finished 05:13:15 2019-08-15
      Out[269]:
      {'<START>': ['<hasnext@61:self.string>'],
       '<hasnext@61:self.string>': ['<parseparenthesis@137:char> <parsemultiplication@111:char> 2 * 3',
        '<parseparenthesis@137:char> <parsemultiplication@111:char> pi / 4',
        '<parseparenthesis@137:char>(1 + 2) <parsemultiplication@111:char> 3',
        '<parseparenthesis@137:char>.0 <parsemultiplication@111:char> 3 <parsemultiplication@111:char> 6',
        '<parseparenthesis@137:char>00',
        '<parseparenthesis@137:char>1 + 2) <parsemultiplication@111:char> 3',
        '<parseparenthesis@137:char>1 - 1 + -1) <parsemultiplication@111:char> pi',
        '<parseparenthesis@137:char>1-2)<parsemultiplication@111:char>3.0 <parsemultiplication@111:char> 0.0000',
        '<parseparenthesis@137:char>100)',
        '<parseparenthesis@137:char>123<parsemultiplication@111:char>133*(12-3)/9+8)+33',
        '<parseparenthesis@137:char>1<parsemultiplication@111:char>20<parsemultiplication@111:char>2',
        '<parseparenthesis@137:char>3<parsemultiplication@111:char>234*22*4',
        '<parseparenthesis@137:char>43<parsemultiplication@111:char>334<parseaddition@93:char>33-222',
        '<parseparenthesis@137:char>55<parsemultiplication@111:char>(234-445)',
        '<parseparenthesis@137:char><parsemultiplication@111:char>(41/2)',
        '<parseparenthesis@137:char><parsemultiplication@111:char>2',
        '<parseparenthesis@137:char>a + b) <parsemultiplication@111:char> c',
        '<parseparenthesis@137:char>bs(-2) <parsemultiplication@111:char> pi / 4',
        '<parseparenthesis@137:char>i <parsemultiplication@111:char> e',
        '<parseparenthesis@137:char>os(pi) <parsemultiplication@111:char> 1',
        '<parseparenthesis@137:char>os(x<parsemultiplication@111:char>4*3) + 2 * 3',
        '<parseparenthesis@137:char>ow(3, 5)',
        '<parseparenthesis@137:char>pi + e * 10) <parsemultiplication@111:char> 10',
        '<parseparenthesis@137:char>pi<parsemultiplication@111:char>e+2)*3<parsemultiplication@111:char>(423<parsemultiplication@111:char>334+9983)-5-((6)-(701-x))',
        '<parseparenthesis@137:char>tan2(2, 1)',
        '<parseparenthesis@137:char>x + e * 10) <parsemultiplication@111:char> 10',
        '<parseparenthesis@137:char>xp(0)',
        '<parseparenthesis@137:char>ypot(5, 12)'],
       '<parseparenthesis@137:char>': ['(',
        '-',
        '1',
        '2',
        '3',
        '4',
        '5',
        'a',
        'c',
        'e',
        'h',
        'p'],
       '<parsemultiplication@111:char>': ['*', '/', '<parseaddition@93:char>'],
       '<parseaddition@93:char>': ['+', '-']}
      
      . . .
      In [270]:
      xxxxxxxxxx
      
      4
       
      1
      if 'mathexpr' in CHECK:
      
      2
          result = check_precision('mathexpr.py', autogram_mathexpr_grammar_t)
      
      3
          Autogram_p['mathexpr.py'] = result
      
      4
          print(result)
      
      executed in 24.2s, finished 05:13:39 2019-08-15
      (301, 1000)
      
      . . .
      In [271]:
      xxxxxxxxxx
      
      4
       
      1
      if 'mathexpr' in CHECK:
      
      2
          result = check_recall(mathexpr_golden, autogram_mathexpr_grammar_t, subjects.mathexpr.main)
      
      3
          Autogram_r['mathexpr.py'] = result
      
      4
          print(result)
      
      executed in 7m 36s, finished 05:21:15 2019-08-15
      (0, 1000)
      
      . . .
      1
       
      1
      ### URLParse
      

      2.2.4  URLParse¶

      1
       
      1
      #### Golden Grammar
      

      2.2.4.1  Golden Grammar¶

      In [272]:
      xxxxxxxxxx
      
      87
       
      1
      urlparse_golden = {
      
      2
        "<START>": [
      
      3
          "<url>"
      
      4
        ],
      
      5
        "<url>": [
      
      6
          "<scheme>://<authority><path><query>"
      
      7
        ],
      
      8
        "<scheme>": [
      
      9
          "http",
      
      10
          "https",
      
      11
          "ftp",
      
      12
          "ftps"
      
      13
        ],
      
      14
        "<authority>": [
      
      15
          "<host>",
      
      16
          "<host>:<port>",
      
      17
          "<userinfo>@<host>",
      
      18
          "<userinfo>@<host>:<port>"
      
      19
        ],
      
      20
        "<user>": [
      
      21
          "user1",
      
      22
          "user2",
      
      23
          "user3",
      
      24
          "user4",
      
      25
          "user5"
      
      26
        ],
      
      27
        "<pass>": [
      
      28
          "pass1",
      
      29
          "pass2",
      
      30
          "pass3",
      
      31
          "pass4",
      
      32
          "pass5"
      
      33
        ],
      
      34
      ​
      
      35
        "<host>": [
      
      36
          "host1",
      
      37
          "host2",
      
      38
          "host3",
      
      39
          "host4",
      
      40
          "host5"
      
      41
        ],
      
      42
        "<port>": [
      
      43
          "<nat>"
      
      44
        ],
      
      45
        "<nat>": [
      
      46
          "10",
      
      47
          "20",
      
      48
          "30",
      
      49
          "40",
      
      50
          "50"
      
      51
        ],
      
      52
        "<userinfo>": [
      
      53
          "<user>:<pass>"
      
      54
        ],
      
      55
        "<path>": [
      
      56
          "",
      
      57
          "/",
      
      58
          "/<id>",
      
      59
          "/<id><path>"
      
      60
        ],
      
      61
        "<id>": [
      
      62
          "folder"
      
      63
        ],
      
      64
        "<query>": [
      
      65
          "",
      
      66
          "?<params>"
      
      67
        ],
      
      68
        "<params>": [
      
      69
          "<param>",
      
      70
          "<param>&<params>"
      
      71
        ],
      
      72
        "<param>": [
      
      73
          "<key>=<value>"
      
      74
        ],
      
      75
        "<key>": [
      
      76
          "key1",
      
      77
          "key2",
      
      78
          "key3",
      
      79
          "key4"
      
      80
        ],
      
      81
        "<value>": [
      
      82
          "value1",
      
      83
          "value2",
      
      84
          "value3",
      
      85
          "value4"
      
      86
        ]
      
      87
      }
      
      executed in 9ms, finished 05:21:15 2019-08-15
      . . .
      1
       
      1
      #### Samples
      

      2.2.4.2  Samples¶

      In [273]:
      xxxxxxxxxx
      
      84
       
      1
      urlparse_samples = [i.strip() for i in '''
      
      2
      http://www.python.org
      
      3
      http://www.python.org#abc
      
      4
      http://www.python.org'
      
      5
      http://www.python.org#abc'
      
      6
      http://www.python.org?q=abc
      
      7
      http://www.python.org/#abc
      
      8
      http://a/b/c/d;p?q#f
      
      9
      https://www.python.org
      
      10
      https://www.python.org#abc
      
      11
      https://www.python.org?q=abc
      
      12
      https://www.python.org/#abc
      
      13
      https://a/b/c/d;p?q#f
      
      14
      http://www.python.org?q=abc
      
      15
      file:///tmp/junk.txt
      
      16
      imap://mail.python.org/mbox1
      
      17
      mms://wms.sys.hinet.net/cts/Drama/09006251100.asf
      
      18
      nfs://server/path/to/file.txt
      
      19
      svn+ssh://svn.zope.org/repos/main/ZConfig/trunk/
      
      20
      git+ssh://git@github.com/user/project.git
      
      21
      file:///tmp/junk.txt
      
      22
      imap://mail.python.org/mbox1
      
      23
      mms://wms.sys.hinet.net/cts/Drama/09006251100.asf
      
      24
      nfs://server/path/to/file.txt
      
      25
      http://www.python.org/#abc
      
      26
      svn+ssh://svn.zope.org/repos/main/ZConfig/trunk/
      
      27
      git+ssh://git@github.com/user/project.git
      
      28
      g:h
      
      29
      http://a/b/c/g
      
      30
      http://a/b/c/g/
      
      31
      http://a/g
      
      32
      http://g
      
      33
      http://a/b/c/g?y
      
      34
      http://a/b/c/g?y/./x
      
      35
      http://a/b/c/d;p?q#f
      
      36
      http://a/b/c/d;p?q#s
      
      37
      http://a/b/c/g#s
      
      38
      http://a/b/c/g#s/./x
      
      39
      http://a/b/c/g?y#s
      
      40
      http://a/b/c/g;x
      
      41
      http://a/b/c/g;x?y#s
      
      42
      http://a/b/c/
      
      43
      http://a/b/
      
      44
      https://www.python.org
      
      45
      http://a/b/g
      
      46
      http://a/
      
      47
      http://a/g
      
      48
      http://a/b/c/d;p?q#f
      
      49
      http://a/../g
      
      50
      g:h
      
      51
      http://a/b/c/g
      
      52
      http://a/b/c/g/
      
      53
      https://www.python.org#abc
      
      54
      http://g
      
      55
      http://a/b/c/g?y
      
      56
      http://a/b/c/d;p?q#s
      
      57
      http://a/b/c/g#s
      
      58
      http://a/b/c/g?y#s
      
      59
      http://a/b/c/g;x
      
      60
      http://a/b/c/g;x?y#s
      
      61
      https://www.python.org?q=abc
      
      62
      https://www.python.org/#abc
      
      63
      http://[::1]:5432/foo/
      
      64
      http://[dead:beef::1]:5432/foo/
      
      65
      http://[dead:beef::]:5432/foo/
      
      66
      http://[dead:beef:cafe:5417:affe:8FA3:deaf:feed]:5432/foo/
      
      67
      http://[::12.34.56.78]:5432/foo/
      
      68
      http://[::ffff:12.34.56.78]:5432/foo/
      
      69
      http://Test.python.org/foo/
      
      70
      http://12.34.56.78/foo/
      
      71
      http://[::1]/foo/
      
      72
      http://[dead:beef::1]/foo/
      
      73
      https://a/b/c/d;p?q#f
      
      74
      http://[dead:beef::]/foo/
      
      75
      http://[dead:beef:cafe:5417:affe:8FA3:deaf:feed]/foo/
      
      76
      http://[::12.34.56.78]/foo/
      
      77
      http://[::ffff:12.34.56.78]/foo/
      
      78
      http://Test.python.org:5432/foo/
      
      79
      http://12.34.56.78:5432/foo/
      
      80
      http://[::1]:5432/foo/
      
      81
      http://[dead:beef::1]:5432/foo/
      
      82
      http://[dead:beef::]:5432/foo/
      
      83
      http://[dead:beef:cafe:5417:affe:8FA3:deaf:feed]:5432/foo/
      
      84
      '''.strip().split('\n') if i.strip()]
      
      executed in 8ms, finished 05:21:15 2019-08-15
      . . .
      1
       
      1
      Unfortunately, as we detail in the paper, both the miners are unable to generalize well with the kind of inputs above. The problem is the lack of generalization of string tokens. Hence we use the ones below, which are generated using the _golden grammar_. This is the output of simply using the golden grammar to fuzz and generate 100 inputs. Captured here for deterministic reproduction.
      

      Unfortunately, as we detail in the paper, both the miners are unable to generalize well with the kind of inputs above. The problem is the lack of generalization of string tokens. Hence we use the ones below, which are generated using the golden grammar. This is the output of simply using the golden grammar to fuzz and generate 100 inputs. Captured here for deterministic reproduction.

      In [274]:
      xxxxxxxxxx
      
      102
       
      1
      urlparse_samples = [i.strip() for i in '''
      
      2
      https://user4:pass2@host2:30/folder//?key1=value3
      
      3
      ftp://user2:pass5@host2?key3=value1
      
      4
      ftp://host1/folder//
      
      5
      ftp://host4:30/folder
      
      6
      http://user1:pass4@host1/folder
      
      7
      https://user1:pass4@host4
      
      8
      ftp://host3:40/
      
      9
      http://user5:pass3@host1:10/
      
      10
      http://host4:10
      
      11
      ftp://host4/folder//?key4=value2
      
      12
      https://host5/folder
      
      13
      ftp://user4:pass5@host4/folder//folder//folder/
      
      14
      ftp://user5:pass2@host3
      
      15
      https://host2/
      
      16
      https://user4:pass3@host3/folder
      
      17
      http://host5:50
      
      18
      https://host3/folder?key3=value3
      
      19
      http://user5:pass3@host1/folder?key1=value4&key4=value2&key2=value1&key2=value3
      
      20
      https://user4:pass3@host1/folder
      
      21
      http://user3:pass3@host2:40/
      
      22
      ftp://host2/folder?key2=value3
      
      23
      https://user4:pass4@host2:50/folder/
      
      24
      https://user3:pass5@host4?key4=value1
      
      25
      ftp://user3:pass3@host1:40?key1=value3
      
      26
      https://user1:pass1@host3:50
      
      27
      ftps://user2:pass2@host3/
      
      28
      https://host4:30/folder
      
      29
      http://host5/folder/?key2=value2
      
      30
      ftps://host3:10/folder/
      
      31
      ftp://user4:pass4@host5/folder
      
      32
      http://user2:pass2@host4:10/folder//folder//folder/
      
      33
      ftp://host1:10/folder/
      
      34
      ftp://host3?key3=value1&key1=value3
      
      35
      ftp://user5:pass2@host4/folder//
      
      36
      http://host2
      
      37
      ftps://user5:pass3@host3:30
      
      38
      ftp://host5/folder
      
      39
      https://user2:pass2@host4:20/?key2=value4&key1=value2&key3=value3&key3=value2&key4=value3
      
      40
      https://host3/folder//folder//folder
      
      41
      ftp://user2:pass3@host4:50/
      
      42
      ftps://user5:pass5@host4/
      
      43
      ftps://user3:pass3@host5?key3=value3
      
      44
      ftp://host4?key1=value3&key3=value3&key3=value1
      
      45
      https://host3/?key4=value2&key1=value2&key4=value3&key2=value4
      
      46
      ftps://host1/folder//
      
      47
      ftp://host5/folder//
      
      48
      https://user2:pass1@host5:10/folder//
      
      49
      http://user5:pass5@host2:10/folder
      
      50
      https://host5/folder
      
      51
      ftps://user5:pass3@host4:40/?key1=value3
      
      52
      http://user1:pass3@host4/folder//?key4=value4&key3=value3
      
      53
      http://user2:pass2@host5:50/folder?key4=value3&key4=value2
      
      54
      http://host3?key3=value3&key2=value2
      
      55
      https://user3:pass3@host2:20/folder
      
      56
      https://host5/folder?key2=value1&key3=value2&key1=value4&key3=value4&key3=value1&key1=value2&key1=value2
      
      57
      ftp://user2:pass5@host5:40/?key4=value4
      
      58
      https://user3:pass4@host2:20/
      
      59
      ftps://host3:30/?key3=value1
      
      60
      ftp://host3/folder
      
      61
      ftps://user1:pass1@host5:20/?key3=value1
      
      62
      https://user4:pass5@host3?key4=value2
      
      63
      ftp://host4:40/folder?key3=value1
      
      64
      ftps://host2/folder//folder
      
      65
      https://host2
      
      66
      https://user2:pass5@host5:50?key1=value4&key1=value1&key2=value1&key2=value1
      
      67
      https://user4:pass5@host1/?key1=value2&key1=value1
      
      68
      http://host4:40/folder?key4=value3&key4=value2
      
      69
      http://host1:40
      
      70
      ftps://host3:30/
      
      71
      ftps://host1/folder/?key4=value1&key1=value4
      
      72
      http://user1:pass1@host1:10/folder/?key2=value2&key2=value3&key3=value4
      
      73
      http://host3/folder?key2=value2
      
      74
      ftps://user4:pass3@host3:50/?key1=value4
      
      75
      ftp://host2/folder//folder
      
      76
      ftp://user2:pass4@host4:40/folder?key3=value2&key2=value1&key2=value2&key4=value3&key3=value3&key3=value1
      
      77
      ftps://user4:pass5@host4:50?key4=value2
      
      78
      https://host3:10
      
      79
      ftp://user1:pass3@host3:10/folder/
      
      80
      ftps://host4:30/
      
      81
      ftp://user4:pass2@host1/folder/?key3=value2&key2=value4&key1=value3&key3=value2
      
      82
      https://host2/folder?key3=value3&key4=value4&key2=value2
      
      83
      ftp://host2:50/?key2=value4&key2=value4&key4=value1&key2=value2&key2=value3&key4=value1
      
      84
      ftps://user2:pass4@host2/
      
      85
      ftps://host3:40/
      
      86
      ftps://user4:pass5@host2/
      
      87
      ftp://host2:10/?key3=value3&key4=value1
      
      88
      http://host2/folder/?key3=value1&key2=value4
      
      89
      https://host5/folder?key4=value2
      
      90
      https://user3:pass4@host1:20
      
      91
      ftp://user3:pass3@host5/
      
      92
      https://user1:pass4@host5/
      
      93
      https://user3:pass2@host1/folder//
      
      94
      ftps://host5:30?key1=value1&key2=value3&key3=value2&key2=value3&key4=value2&key2=value3
      
      95
      ftps://user2:pass5@host3:30?key3=value2
      
      96
      ftps://host4:10/?key1=value1&key4=value3
      
      97
      https://host2:30
      
      98
      https://host5:40/folder
      
      99
      http://user2:pass4@host5:50/folder
      
      100
      ftp://user5:pass1@host3:50?key3=value2&key1=value4
      
      101
      ftp://host1/folder//folder
      
      102
      '''.strip().split('\n') if i.strip()]
      
      executed in 6ms, finished 05:21:15 2019-08-15
      . . .
      1
       
      1
      #### Mimid
      

      2.2.4.3  Mimid¶

      In [275]:
      xxxxxxxxxx
      
      4
       
      1
      %%time
      
      2
      with timeit() as t:
      
      3
          urlparse_grammar = accio_grammar('urlparse.py', VARS['urlparse_src'], urlparse_samples)
      
      4
      Mimid_t['urlparse.py'] = t.runtime
      
      executed in 6.13s, finished 05:21:21 2019-08-15
      CPU times: user 356 ms, sys: 265 ms, total: 621 ms
      Wall time: 6.13 s
      
      . . .
      In [276]:
      xxxxxxxxxx
      
      1
       
      1
      save_grammar(urlparse_grammar, 'mimid', 'urlparse')
      
      executed in 11ms, finished 05:21:21 2019-08-15
      Out[276]:
      {'<START>': ['<urlparse-1>'],
       '<urlparse-1>': ['<urlsplit-1>', '<urlsplit-1>/'],
       '<urlsplit-1>': ['<urlsplit-7>', '<urlsplit-7><urlsplit-1-c>'],
       '<urlsplit-7>': ['<urlsplit-20>', 'f', 'http:<urlsplit-18>', 'https'],
       '<urlsplit-1-c>': ['://<urlsplit-16>',
        'host1:40',
        'host2',
        'host4:10',
        'host5:50',
        's<urlsplit-8-c>',
        'tp://<urlsplit-16>',
        'tp<urlsplit-13-c>',
        'tps<urlsplit-4-c>'],
       '<urlsplit-20>': ['http', 'http://<_splitnetloc-0-c>'],
       '<urlsplit-18>': ['//', '//<urlsplit-19>'],
       '<_splitnetloc-0-c>': ['host1',
        'host1/folder//',
        'host1/folder//folder',
        'host1:10/folder/',
        'host2',
        'host2/folder//folder',
        'host2:10',
        'host2:50',
        'host3',
        'host3/folder',
        'host3/folder//folder//folder',
        'host3:10/folder/',
        'host3:30',
        'host3:40',
        'host4',
        'host4:10',
        'host4:30',
        'host4:30/folder',
        'host4:40',
        'host5',
        'host5/folder',
        'host5/folder//',
        'host5:30',
        'host5:40/folder',
        'user1:pass1@host1:10',
        'user1:pass1@host5:20',
        'user1:pass3@host3:10/folder/',
        'user1:pass3@host4',
        'user1:pass4@host1/folder',
        'user1:pass4@host5',
        'user2:pass1@host5:10/folder//',
        'user2:pass2@host3',
        'user2:pass2@host4:10/folder//folder//folder/',
        'user2:pass2@host4:20',
        'user2:pass2@host5:50',
        'user2:pass3@host4:50',
        'user2:pass4@host2',
        'user2:pass4@host4:40',
        'user2:pass4@host5:50/folder',
        'user2:pass5@host2',
        'user2:pass5@host3:30',
        'user2:pass5@host5:40',
        'user2:pass5@host5:50',
        'user3:pass2@host1/folder//',
        'user3:pass3@host1:40',
        'user3:pass3@host2:20/folder',
        'user3:pass3@host2:40',
        'user3:pass3@host5',
        'user3:pass4@host2:20',
        'user3:pass5@host4',
        'user4:pass2@host1',
        'user4:pass2@host2:30',
        'user4:pass3@host1/folder',
        'user4:pass3@host3/folder',
        'user4:pass3@host3:50',
        'user4:pass4@host2:50/folder/',
        'user4:pass4@host5/folder',
        'user4:pass5@host1',
        'user4:pass5@host2',
        'user4:pass5@host3',
        'user4:pass5@host4/folder//folder//folder/',
        'user4:pass5@host4:50',
        'user5:pass1@host3:50',
        'user5:pass2@host4/folder//',
        'user5:pass3@host1',
        'user5:pass3@host1:10',
        'user5:pass3@host4:40',
        'user5:pass5@host2:10/folder',
        'user5:pass5@host4'],
       '<urlsplit-19>': ['<_splitnetloc-0-c>', '<_splitnetloc-0-c><urlsplit-2>'],
       '<urlsplit-2>': ['/folder//?key4=value4&key3=value3',
        '/folder/?key2=value2',
        '/folder/?key2=value2&key2=value3&key3=value4',
        '/folder/?key3=value1&key2=value4',
        '/folder?key1=value4&key4=value2&key2=value1&key2=value3',
        '/folder?key2=value2',
        '/folder?key4=value3&key4=value2',
        '?key3=value3&key2=value2'],
       '<urlsplit-16>': ['<_splitnetloc-0-c>', '<_splitnetloc-0-c>/<urlsplit>'],
       '<urlsplit-8-c>': ['://host2',
        '://host2:30',
        '://host3:10',
        '://user1:pass1@host3:50',
        '://user1:pass4@host4',
        '://user3:pass4@host1:20',
        '<urlsplit-9>'],
       '<urlsplit-13-c>': ['://user5:pass2@host3', '<urlsplit-9>'],
       '<urlsplit-4-c>': ['://<urlsplit-6>', '://user5:pass3@host3:30'],
       '<urlsplit>': ['/?key1=value1&key4=value3',
        '/?key1=value3',
        '/?key1=value4',
        '/?key3=value1',
        '/folder//?key1=value3',
        '/folder//?key4=value2',
        '/folder/?key3=value2&key2=value4&key1=value3&key3=value2',
        '/folder/?key4=value1&key1=value4',
        '/folder?key2=value1&key3=value2&key1=value4&key3=value4&key3=value1&key1=value2&key1=value2',
        '/folder?key2=value3',
        '/folder?key3=value1',
        '/folder?key3=value2&key2=value1&key2=value2&key4=value3&key3=value3&key3=value1',
        '/folder?key3=value3',
        '/folder?key3=value3&key4=value4&key2=value2',
        '/folder?key4=value2',
        '?key1=value1&key2=value3&key3=value2&key2=value3&key4=value2&key2=value3',
        '?key1=value2&key1=value1',
        '?key1=value3',
        '?key1=value3&key3=value3&key3=value1',
        '?key1=value4&key1=value1&key2=value1&key2=value1',
        '?key2=value4&key1=value2&key3=value3&key3=value2&key4=value3',
        '?key2=value4&key2=value4&key4=value1&key2=value2&key2=value3&key4=value1',
        '?key3=value1',
        '?key3=value1&key1=value3',
        '?key3=value2',
        '?key3=value2&key1=value4',
        '?key3=value3',
        '?key3=value3&key4=value1',
        '?key4=value1',
        '?key4=value2',
        '?key4=value2&key1=value2&key4=value3&key2=value4',
        '?key4=value4'],
       '<urlsplit-9>': ['://<urlsplit-10>'],
       '<urlsplit-10>': ['<_splitnetloc-0-c>', '<_splitnetloc-0-c><urlsplit>'],
       '<urlsplit-6>': ['<_splitnetloc-0-c>', '<_splitnetloc-0-c><urlsplit-6-c>'],
       '<urlsplit-6-c>': ['/', '<urlsplit>']}
      
      . . .
      In [277]:
      xxxxxxxxxx
      
      4
       
      1
      if 'urlparse' in CHECK:
      
      2
          result = check_precision('urlparse.py', urlparse_grammar)
      
      3
          Mimid_p['urlparse.py'] = result
      
      4
          print(result)
      
      executed in 21.4s, finished 05:21:42 2019-08-15
      (1000, 1000)
      
      . . .
      In [278]:
      xxxxxxxxxx
      
      1
       
      1
      import subjects.urlparse
      
      executed in 17ms, finished 05:21:42 2019-08-15
      . . .
      In [279]:
      xxxxxxxxxx
      
      4
       
      1
      if 'urlparse' in CHECK:
      
      2
          result = check_recall(urlparse_golden, urlparse_grammar, subjects.urlparse.main)
      
      3
          Mimid_r['urlparse.py'] = result
      
      4
          print(result)
      
      executed in 4.59s, finished 05:21:47 2019-08-15
      (153, 1000)
      
      . . .
      1
       
      1
      #### Autogram
      

      2.2.4.4  Autogram¶

      In [280]:
      xxxxxxxxxx
      
      4
       
      1
      %%time
      
      2
      with timeit() as t:
      
      3
          autogram_urlparse_grammar_t = recover_grammar_with_taints('urlparse.py', VARS['urlparse_src'], urlparse_samples)
      
      4
      Autogram_t['urlparse.py'] = t.runtime
      
      executed in 48.0s, finished 05:22:35 2019-08-15
      CPU times: user 12.7 ms, sys: 6.55 ms, total: 19.3 ms
      Wall time: 48 s
      
      . . .
      In [281]:
      xxxxxxxxxx
      
      1
       
      1
      save_grammar(autogram_urlparse_grammar_t, 'autogram_t', 'urlparse')
      
      executed in 42ms, finished 05:22:35 2019-08-15
      Out[281]:
      {'<START>': ['<create@27:self>'],
       '<create@27:self>': ['<__init__@15:self>:<__init__@1047:self._ostr>',
        '<__init__@15:self>:<create@27:self>',
        '<__init__@15:self><__init__@1047:self._ostr>',
        '<__init__@15:self><__init__@1047:self._ostr>/',
        '<__init__@15:self><__init__@1047:self._ostr><urlsplit@434:url>',
        '<__init__@15:self><__init__@1047:self._ostr><urlsplit@458:url>',
        '<__init__@15:self>?<_split_helper@1259:item>',
        '?<_split_helper@1259:item>'],
       '<__init__@15:self>': ['//',
        '<__new__@1:path>/',
        '<__new__@1:scheme>',
        '<_split_helper@1259:item>',
        '<urlsplit@434:url>/',
        '<urlsplit@446:c><urlsplit@446:c><urlsplit@446:c>',
        '<urlsplit@446:c><urlsplit@446:c><urlsplit@446:c><urlsplit@446:c>',
        '<urlsplit@446:c><urlsplit@446:c>t<urlsplit@446:c><urlsplit@446:c>',
        '<urlsplit@458:url>/'],
       '<__init__@1047:self._ostr>': ['<__new__@1:netloc>', '<create@27:self>'],
       '<urlsplit@434:url>': ['<__new__@1:path>', '<create@27:self>'],
       '<urlsplit@458:url>': ['<__new__@1:path>', '<create@27:self>'],
       '<_split_helper@1259:item>': ['/', '<__new__@1:path>', '<__new__@1:query>'],
       '<__new__@1:path>': ['/',
        '/folder',
        '/folder/',
        '/folder//',
        '/folder//folder',
        '/folder//folder//folder',
        '/folder//folder//folder/',
        '<__new__@1:path>'],
       '<__new__@1:scheme>': ['<__new__@1:scheme>', 'http'],
       '<urlsplit@446:c>': ['f', 'h', 'p', 's', 't'],
       '<__new__@1:netloc>': ['host1',
        'host1:10',
        'host1:40',
        'host2',
        'host2:10',
        'host2:30',
        'host2:50',
        'host3',
        'host3:10',
        'host3:30',
        'host3:40',
        'host4',
        'host4:10',
        'host4:30',
        'host4:40',
        'host5',
        'host5:30',
        'host5:40',
        'host5:50',
        'user1:pass1@host1:10',
        'user1:pass1@host3:50',
        'user1:pass1@host5:20',
        'user1:pass3@host3:10',
        'user1:pass3@host4',
        'user1:pass4@host1',
        'user1:pass4@host4',
        'user1:pass4@host5',
        'user2:pass1@host5:10',
        'user2:pass2@host3',
        'user2:pass2@host4:10',
        'user2:pass2@host4:20',
        'user2:pass2@host5:50',
        'user2:pass3@host4:50',
        'user2:pass4@host2',
        'user2:pass4@host4:40',
        'user2:pass4@host5:50',
        'user2:pass5@host2',
        'user2:pass5@host3:30',
        'user2:pass5@host5:40',
        'user2:pass5@host5:50',
        'user3:pass2@host1',
        'user3:pass3@host1:40',
        'user3:pass3@host2:20',
        'user3:pass3@host2:40',
        'user3:pass3@host5',
        'user3:pass4@host1:20',
        'user3:pass4@host2:20',
        'user3:pass5@host4',
        'user4:pass2@host1',
        'user4:pass2@host2:30',
        'user4:pass3@host1',
        'user4:pass3@host3',
        'user4:pass3@host3:50',
        'user4:pass4@host2:50',
        'user4:pass4@host5',
        'user4:pass5@host1',
        'user4:pass5@host2',
        'user4:pass5@host3',
        'user4:pass5@host4',
        'user4:pass5@host4:50',
        'user5:pass1@host3:50',
        'user5:pass2@host3',
        'user5:pass2@host4',
        'user5:pass3@host1',
        'user5:pass3@host1:10',
        'user5:pass3@host3:30',
        'user5:pass3@host4:40',
        'user5:pass5@host2:10',
        'user5:pass5@host4'],
       '<__new__@1:query>': ['<__new__@1:query>',
        'key1=value1&key2=value3&key3=value2&key2=value3&key4=value2&key2=value3',
        'key1=value1&key4=value3',
        'key1=value2&key1=value1',
        'key1=value3',
        'key1=value3&key3=value3&key3=value1',
        'key1=value4',
        'key1=value4&key1=value1&key2=value1&key2=value1',
        'key1=value4&key4=value2&key2=value1&key2=value3',
        'key2=value1&key3=value2&key1=value4&key3=value4&key3=value1&key1=value2&key1=value2',
        'key2=value2',
        'key2=value2&key2=value3&key3=value4',
        'key2=value3',
        'key2=value4&key1=value2&key3=value3&key3=value2&key4=value3',
        'key2=value4&key2=value4&key4=value1&key2=value2&key2=value3&key4=value1',
        'key3=value1',
        'key3=value1&key1=value3',
        'key3=value1&key2=value4',
        'key3=value2',
        'key3=value2&key1=value4',
        'key3=value2&key2=value1&key2=value2&key4=value3&key3=value3&key3=value1',
        'key3=value2&key2=value4&key1=value3&key3=value2',
        'key3=value3',
        'key3=value3&key2=value2',
        'key3=value3&key4=value1',
        'key3=value3&key4=value4&key2=value2',
        'key4=value1',
        'key4=value1&key1=value4',
        'key4=value2',
        'key4=value2&key1=value2&key4=value3&key2=value4',
        'key4=value3&key4=value2',
        'key4=value4',
        'key4=value4&key3=value3']}
      
      . . .
      In [282]:
      xxxxxxxxxx
      
      4
       
      1
      if 'urlparse' in CHECK:
      
      2
          result = check_precision('urlparse.py', autogram_urlparse_grammar_t)
      
      3
          Autogram_p['urlparse.py'] = result
      
      4
          print(result)
      
      executed in 39.7s, finished 05:23:15 2019-08-15
      (1000, 1000)
      
      . . .
      In [283]:
      xxxxxxxxxx
      
      4
       
      1
      if 'urlparse' in CHECK:
      
      2
          result = check_recall(urlparse_golden, autogram_urlparse_grammar_t, subjects.urlparse.main)
      
      3
          Autogram_r['urlparse.py'] = result
      
      4
          print(result)
      
      executed in 18.2s, finished 05:23:33 2019-08-15
      (277, 1000)
      
      . . .
      1
       
      1
      ### Netrc
      

      2.2.5  Netrc¶

      1
       
      1
      #### Golden Grammar
      

      2.2.5.1  Golden Grammar¶

      In [284]:
      xxxxxxxxxx
      
      51
       
      1
      netrc_golden = {
      
      2
        "<START>": [
      
      3
          "<entries>"
      
      4
        ],
      
      5
        "<entries>": [
      
      6
          "<entry><whitespace><entries>",
      
      7
          "<entry>"
      
      8
        ],
      
      9
        "<entry>": [
      
      10
          "machine<whitespace><mvalue><whitespace><fills>",
      
      11
          "default<whitespace<whitespace><fills>"
      
      12
        ],
      
      13
        "<whitespace>": [
      
      14
          " "
      
      15
        ],
      
      16
        "<mvalue>": [
      
      17
          "m1",
      
      18
          "m2",
      
      19
          "m3"
      
      20
        ],
      
      21
        "<accvalue>": [
      
      22
          "a1",
      
      23
          "a2",
      
      24
          "a3"
      
      25
        ],
      
      26
        "<uservalue>": [
      
      27
          "u1",
      
      28
          "u2",
      
      29
          "u3"
      
      30
        ],
      
      31
        "<passvalue>": [
      
      32
          "pwd1",
      
      33
          "pwd2",
      
      34
          "pwd3"
      
      35
        ],
      
      36
        "<lvalue>": [
      
      37
          "l1",
      
      38
          "l2",
      
      39
          "l3"
      
      40
        ],
      
      41
        "<fills>": [
      
      42
          "<fill>",
      
      43
          "<fill><whitespace><fills>"
      
      44
        ],
      
      45
        "<fill>": [
      
      46
          "account<whitespace><accvalue>",
      
      47
          "username<whitespace><uservalue>",
      
      48
          "password<whitespace><passvalue>",
      
      49
          "login<whitespace><lvalue>"
      
      50
        ]
      
      51
      }
      
      executed in 7ms, finished 05:23:33 2019-08-15
      . . .
      1
       
      1
      #### Samples
      

      2.2.5.2  Samples¶

      In [285]:
      xxxxxxxxxx
      
      34
       
      1
      netrc_samples = [i.strip().replace('\n', ' ') for i in [
      
      2
      '''
      
      3
      machine m1 login u1 password pwd1
      
      4
      ''','''
      
      5
      machine m2 login u1 password pwd2
      
      6
      ''','''
      
      7
      default login u1 password pwd1
      
      8
      ''','''
      
      9
      machine m1 login u2 password pwd1
      
      10
      ''','''
      
      11
      machine m2 login u2 password pwd2 machine m1 login l1 password pwd1
      
      12
      ''','''
      
      13
      machine m1 login u1 password pwd1 machine m2 login l2 password pwd2
      
      14
      ''','''
      
      15
      machine m2 password pwd2 login u2
      
      16
      ''','''
      
      17
      machine m1 password pwd1 login u1
      
      18
      ''','''
      
      19
      machine m2 login u2 password pwd1
      
      20
      ''','''
      
      21
      default login u2 password pwd3
      
      22
      ''','''
      
      23
      machine m2 login u2 password pwd1 machine m3 login u3 password pwd1 machine m1 login u1 password pwd2
      
      24
      ''','''
      
      25
      machine m2 login u2 password pwd3
      
      26
      machine m1 login u1 password pwd1
      
      27
      ''','''
      
      28
      default login u1 password pwd3
      
      29
      machine m2 login u1 password pwd1
      
      30
      ''','''
      
      31
      machine m1 login l1 password p1
      
      32
      machine m2 login l2 password p2
      
      33
      default login m1 password p1
      
      34
      ''']]
      
      executed in 7ms, finished 05:23:33 2019-08-15
      . . .
      1
       
      1
      As with `urlparse`, we had to use a restricted set of keywords with _netrc_. The below words are produced from fuzzing the golden grammar, captured here for deterministic reproduction.
      

      As with urlparse, we had to use a restricted set of keywords with netrc. The below words are produced from fuzzing the golden grammar, captured here for deterministic reproduction.

      In [286]:
      xxxxxxxxxx
      
      102
       
      1
      netrc_samples = [i.strip() for i in '''
      
      2
      machine m1 password pwd3 login l3
      
      3
      machine m1 login l3 account a3 login l1 password pwd2
      
      4
      machine m2 password pwd2
      
      5
      machine m2 password pwd2 account a2
      
      6
      machine m2 password pwd3
      
      7
      machine m2 password pwd1
      
      8
      machine m1 login l3 password pwd1
      
      9
      machine m2 password pwd3
      
      10
      machine m1 password pwd2 account a1 account a2
      
      11
      machine m2 password pwd3
      
      12
      machine m2 account a1 password pwd3
      
      13
      machine m3 login l3 account a2 password pwd3
      
      14
      machine m2 password pwd2 login l3 password pwd2 password pwd2
      
      15
      machine m3 password pwd2 login l3
      
      16
      machine m3 login l3 account a3 account a2 password pwd3
      
      17
      machine m1 password pwd2
      
      18
      machine m2 account a3 password pwd3
      
      19
      machine m3 password pwd2
      
      20
      machine m3 password pwd1 account a1
      
      21
      machine m2 password pwd1
      
      22
      machine m1 account a1 password pwd1
      
      23
      machine m2 login l1 login l2 account a2 login l3 password pwd2 password pwd2 password pwd2
      
      24
      machine m3 account a3 login l3 account a1 password pwd3
      
      25
      machine m1 password pwd1
      
      26
      machine m2 password pwd3
      
      27
      machine m2 password pwd3
      
      28
      machine m1 account a1 password pwd1 account a1 password pwd3
      
      29
      machine m3 password pwd3
      
      30
      machine m3 password pwd2
      
      31
      machine m2 account a1 account a1 account a2 password pwd2 account a1
      
      32
      machine m3 password pwd1 login l2 login l1
      
      33
      machine m1 account a3 account a3 password pwd1 machine m3 password pwd2
      
      34
      machine m1 login l1 password pwd1
      
      35
      machine m3 password pwd2 login l1 machine m1 password pwd2
      
      36
      machine m3 account a2 password pwd1
      
      37
      machine m1 password pwd3
      
      38
      machine m3 login l2 account a2 password pwd2
      
      39
      machine m2 password pwd3 machine m2 account a1 login l3 password pwd3 password pwd2
      
      40
      machine m1 password pwd2
      
      41
      machine m1 password pwd2
      
      42
      machine m1 password pwd2
      
      43
      machine m2 password pwd3 password pwd2
      
      44
      machine m2 login l1 password pwd1 account a1
      
      45
      machine m3 password pwd1
      
      46
      machine m2 password pwd3 password pwd1
      
      47
      machine m1 password pwd3 password pwd3 password pwd1
      
      48
      machine m2 password pwd1 password pwd1
      
      49
      machine m2 login l2 account a3 password pwd3
      
      50
      machine m1 password pwd1
      
      51
      machine m1 account a3 password pwd3 account a2 password pwd2 account a3 account a3 account a3
      
      52
      machine m3 password pwd3 password pwd3 machine m2 password pwd3
      
      53
      machine m2 password pwd2 login l2 login l1
      
      54
      machine m1 login l3 password pwd2
      
      55
      machine m2 login l2 password pwd1
      
      56
      machine m2 account a3 password pwd2
      
      57
      machine m1 account a2 password pwd1
      
      58
      machine m3 login l1 password pwd2 account a2
      
      59
      machine m1 password pwd3
      
      60
      machine m3 password pwd2
      
      61
      machine m1 password pwd3 password pwd3 password pwd1 machine m2 password pwd3
      
      62
      machine m1 account a2 account a1 login l2 password pwd2
      
      63
      machine m1 login l1 password pwd2 password pwd2 login l3
      
      64
      machine m2 password pwd1 password pwd2
      
      65
      machine m1 password pwd3 account a3
      
      66
      machine m1 login l1 login l2 password pwd2
      
      67
      machine m1 account a1 password pwd1 login l2
      
      68
      machine m2 password pwd1 login l3
      
      69
      machine m2 password pwd2 password pwd1 password pwd3
      
      70
      machine m1 password pwd1 account a1 account a2 login l1
      
      71
      machine m1 password pwd3
      
      72
      machine m2 login l3 password pwd3
      
      73
      machine m3 login l2 login l2 password pwd1 login l2
      
      74
      machine m2 password pwd1
      
      75
      machine m1 password pwd1 login l3 account a2 login l3 password pwd1
      
      76
      machine m3 password pwd3
      
      77
      machine m3 password pwd1 account a1
      
      78
      machine m2 login l3 password pwd1 account a3
      
      79
      machine m3 password pwd3
      
      80
      machine m2 password pwd1
      
      81
      machine m1 login l3 password pwd1 password pwd1
      
      82
      machine m3 password pwd3
      
      83
      machine m2 password pwd2 login l3 login l2 login l1 account a1
      
      84
      machine m1 password pwd1
      
      85
      machine m2 password pwd2 login l3
      
      86
      machine m2 password pwd2
      
      87
      machine m2 password pwd1
      
      88
      machine m3 password pwd3
      
      89
      machine m1 password pwd1
      
      90
      machine m2 account a3 password pwd1
      
      91
      machine m2 login l1 password pwd3
      
      92
      machine m3 password pwd2 login l1 machine m2 password pwd1
      
      93
      machine m2 login l2 account a2 password pwd1 login l2 account a1
      
      94
      machine m1 password pwd2
      
      95
      machine m3 login l1 password pwd1
      
      96
      machine m3 account a2 password pwd2
      
      97
      machine m2 login l1 password pwd3 login l2 account a2
      
      98
      machine m3 account a1 password pwd2
      
      99
      machine m3 login l3 login l3 password pwd1 password pwd1
      
      100
      machine m3 password pwd2 password pwd2 password pwd2 account a2
      
      101
      machine m3 password pwd1
      
      102
      '''.strip().split('\n') if i.strip()]
      
      executed in 7ms, finished 05:23:33 2019-08-15
      . . .
      1
       
      1
      #### Mimid
      

      2.2.5.3  Mimid¶

      In [287]:
      xxxxxxxxxx
      
      4
       
      1
      %%time
      
      2
      with timeit() as t:
      
      3
          netrc_grammar = accio_grammar('netrc.py', VARS['netrc_src'], netrc_samples)
      
      4
      Mimid_t['netrc.py'] = t.runtime
      
      executed in 30.4s, finished 05:24:03 2019-08-15
      CPU times: user 1.23 s, sys: 1.72 s, total: 2.95 s
      Wall time: 30.4 s
      
      . . .
      In [288]:
      xxxxxxxxxx
      
      1
       
      1
      save_grammar(netrc_grammar, 'mimid', 'netrc')
      
      executed in 11ms, finished 05:24:03 2019-08-15
      Out[288]:
      {'<START>': ['<_parse__netrc-0-c>'],
       '<_parse__netrc-0-c>': ['<_parse__netrc-29>',
        'machine <_parse__netrc-22><_parse__netrc-59><_parse__netrc-29>',
        'machine m2 password pwd3 machine m2 <_parse__netrc-23>password pwd2'],
       '<_parse__netrc-29>': ['machine <_parse__netrc-67><_parse__netrc-32>',
        'machine <_parse__netrc-67><_parse__netrc-56>'],
       '<_parse__netrc-22>': ['m1 ', 'm3 '],
       '<_parse__netrc-59>': ['<_parse__netrc-60>',
        '<_parse__netrc-64><_parse__netrc-65>'],
       '<_parse__netrc-23>': ['account a1 ', 'login l3 ', 'password pwd3 '],
       '<_parse__netrc-67>': ['m1 ', 'm2 ', 'm3 '],
       '<_parse__netrc-32>': ['<_parse__netrc-34><_parse__netrc-35>',
        '<_parse__netrc-47><_parse__netrc-48>',
        '<_parse__netrc>'],
       '<_parse__netrc-56>': ['<_parse__netrc-8>',
        '<_parse__netrc-8><_parse__netrc-56>'],
       '<_parse__netrc-34>': ['<_parse__netrc-5>',
        '<_parse__netrc-5><_parse__netrc-34>'],
       '<_parse__netrc-35>': ['<_parse__netrc-37><_parse__netrc-38>',
        '<_parse__netrc>'],
       '<_parse__netrc-47>': ['<_parse__netrc-8>',
        '<_parse__netrc-8><_parse__netrc-47>'],
       '<_parse__netrc-48>': ['<_parse__netrc-16>',
        '<_parse__netrc-50><_parse__netrc-51>'],
       '<_parse__netrc>': ['password <_parse__netrc-19>'],
       '<_parse__netrc-5>': ['account <_parse__netrc-28>',
        'login <_parse__netrc-27>'],
       '<_parse__netrc-28>': ['a1 ', 'a2 ', 'a3 '],
       '<_parse__netrc-27>': ['l1 ', 'l2 ', 'l3 '],
       '<_parse__netrc-37>': ['<_parse__netrc-8>',
        '<_parse__netrc-8><_parse__netrc-37>'],
       '<_parse__netrc-38>': ['<_parse__netrc-16>',
        '<_parse__netrc-40><_parse__netrc-41>'],
       '<_parse__netrc-8>': ['password <_parse__netrc-2>'],
       '<_parse__netrc-2>': ['pwd1', 'pwd1 ', 'pwd2', 'pwd2 ', 'pwd3', 'pwd3 '],
       '<_parse__netrc-16>': ['<_parse__netrc>',
        'account <_parse__netrc-20>',
        'login <_parse__netrc-9>'],
       '<_parse__netrc-40>': ['<_parse__netrc-5>',
        '<_parse__netrc-5><_parse__netrc-40>'],
       '<_parse__netrc-41>': ['<_parse__netrc-16>',
        '<_parse__netrc-43><_parse__netrc-45><_parse__netrc-16>'],
       '<_parse__netrc-20>': ['a1', 'a2', 'a3'],
       '<_parse__netrc-9>': ['l1', 'l2', 'l3'],
       '<_parse__netrc-43>': ['<_parse__netrc-8>',
        '<_parse__netrc-8><_parse__netrc-43>'],
       '<_parse__netrc-45>': ['<_parse__netrc-5>',
        '<_parse__netrc-5><_parse__netrc-45>'],
       '<_parse__netrc-50>': ['<_parse__netrc-5>',
        '<_parse__netrc-5><_parse__netrc-50>'],
       '<_parse__netrc-51>': ['<_parse__netrc-16>',
        '<_parse__netrc-8><_parse__netrc-16>'],
       '<_parse__netrc-19>': ['pwd1', 'pwd2', 'pwd3'],
       '<_parse__netrc-60>': ['<_parse__netrc-61>',
        '<_parse__netrc-61><_parse__netrc-62>'],
       '<_parse__netrc-64>': ['<_parse__netrc-68>',
        '<_parse__netrc-68><_parse__netrc-64>'],
       '<_parse__netrc-65>': ['<_parse__netrc-4>',
        '<_parse__netrc-4><_parse__netrc-65>'],
       '<_parse__netrc-61>': ['<_parse__netrc-4>',
        '<_parse__netrc-4><_parse__netrc-61>'],
       '<_parse__netrc-62>': ['<_parse__netrc-68>',
        '<_parse__netrc-68><_parse__netrc-62>'],
       '<_parse__netrc-4>': ['password <_parse__netrc-66>'],
       '<_parse__netrc-66>': ['pwd1 ', 'pwd2 ', 'pwd3 '],
       '<_parse__netrc-68>': ['account a3 ', 'login l1 ']}
      
      . . .
      In [289]:
      xxxxxxxxxx
      
      4
       
      1
      if 'netrc' in CHECK:
      
      2
          result = check_precision('netrc.py', netrc_grammar)
      
      3
          Mimid_p['netrc.py'] = result
      
      4
          print(result)
      
      executed in 25.6s, finished 05:24:29 2019-08-15
      (773, 1000)
      
      . . .
      In [290]:
      xxxxxxxxxx
      
      2
       
      1
      !cp build/mylex.py .
      
      2
      !cp build/myio.py .
      
      executed in 288ms, finished 05:24:29 2019-08-15
      . . .
      In [291]:
      xxxxxxxxxx
      
      1
       
      1
      import subjects.netrc
      
      executed in 13ms, finished 05:24:29 2019-08-15
      . . .
      In [292]:
      xxxxxxxxxx
      
      4
       
      1
      if 'netrc' in CHECK:
      
      2
          result = check_recall(netrc_golden, netrc_grammar, subjects.netrc.main)
      
      3
          Mimid_r['netrc.py'] = result
      
      4
          print(result)
      
      executed in 37.7s, finished 05:25:07 2019-08-15
      (949, 1000)
      
      . . .
      1
       
      1
      #### Autogram
      

      2.2.5.4  Autogram¶

      In [293]:
      xxxxxxxxxx
      
      4
       
      1
      %%time
      
      2
      with timeit() as t:
      
      3
          autogram_netrc_grammar_t = recover_grammar_with_taints('netrc.py', VARS['netrc_src'], netrc_samples)
      
      4
      Autogram_t['netrc.py'] = t.runtime
      
      executed in 2m 59s, finished 05:28:06 2019-08-15
      CPU times: user 15.5 ms, sys: 8.65 ms, total: 24.2 ms
      Wall time: 2min 59s
      
      . . .
      In [294]:
      xxxxxxxxxx
      
      1
       
      1
      save_grammar(autogram_netrc_grammar_t, 'autogram_t', 'netrc')
      
      executed in 11ms, finished 05:28:06 2019-08-15
      Out[294]:
      {'<START>': ['<create@27:self>'],
       '<create@27:self>': ['<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@85:account> login l3 password pwd1',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login> <create@27:self>chine <_parse@47:entryname> password <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login> <create@27:self>chine <_parse@47:entryname> password pwd2',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login> login <_parse@83:login>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login> login <_parse@83:login> login <_parse@83:login> <_parse@70:tt> <_parse@85:account>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login> password pwd2 password pwd2',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@85:account>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@85:account> account <_parse@85:account>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@85:account> account <_parse@85:account> <_parse@70:tt> <_parse@83:login>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> <create@27:self>chine m2 <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@83:login> password pwd3 password <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password <_parse@108:password> password <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password pwd1',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password pwd2 password pwd2 <_parse@70:tt> <_parse@85:account>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password pwd3 <create@27:self>chine <_parse@47:entryname> password pwd3',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password pwd3 password <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@108:password> password pwd3 password <_parse@108:password> <create@27:self>chine <_parse@47:entryname> password pwd3',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@85:account>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@108:password> login <_parse@83:login> <_parse@70:tt> <_parse@85:account>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@108:password> password pwd1',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@108:password> password pwd2 login <_parse@83:login>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@108:password> login l2 account <_parse@85:account>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@85:account> account <_parse@85:account> <_parse@70:tt> <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@85:account> login <_parse@83:login> <_parse@70:tt> <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> login <_parse@83:login> <_parse@70:tt> <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> login <_parse@83:login> <_parse@70:tt> <_parse@85:account> login <_parse@83:login> <_parse@70:tt> <_parse@108:password> password pwd2 password pwd2',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> login l2 <_parse@70:tt> <_parse@108:password> login l2',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@83:login> login l3 <_parse@70:tt> <_parse@108:password> password pwd1',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@108:password> <_parse@70:tt> <_parse@83:login>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@108:password> account <_parse@85:account> password <_parse@108:password> account a3 account a3 account a3',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@108:password> account a1 password <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> <_parse@70:tt> <_parse@83:login> account <_parse@85:account> <_parse@70:tt> <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> account <_parse@85:account> <_parse@70:tt> <_parse@83:login> <_parse@70:tt> <_parse@108:password>',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> account a1 account <_parse@85:account> <_parse@70:tt> <_parse@108:password> account a1',
        '<read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><read_token@137:nextchar><_parse@47:entryname> <_parse@70:tt> <_parse@85:account> account a3 <_parse@70:tt> <_parse@108:password> <create@27:self>chine <_parse@47:entryname> password <_parse@108:password>',
        'm',
        'ma'],
       '<read_token@137:nextchar>': [' ',
        '<__add__@1115:other>',
        '<create@27:self>'],
       '<_parse@47:entryname>': ['m1', 'm2', 'm3'],
       '<_parse@70:tt>': ['account', 'login', 'password'],
       '<_parse@108:password>': ['pwd1', 'pwd2', 'pwd3'],
       '<_parse@83:login>': ['l1', 'l2', 'l3'],
       '<_parse@85:account>': ['a1', 'a2', 'a3'],
       '<__add__@1115:other>': ['a', 'c', 'e', 'h', 'i', 'n']}
      
      . . .
      In [295]:
      xxxxxxxxxx
      
      4
       
      1
      if 'netrc' in CHECK:
      
      2
          result = check_precision('netrc.py', autogram_netrc_grammar_t)
      
      3
          Autogram_p['netrc.py'] = result
      
      4
          print(result)
      
      executed in 41.0s, finished 05:28:47 2019-08-15
      (30, 1000)
      
      . . .
      In [296]:
      xxxxxxxxxx
      
      4
       
      1
      if 'netrc' in CHECK:
      
      2
          result = check_recall(netrc_golden, autogram_netrc_grammar_t, subjects.netrc.main)
      
      3
          Autogram_r['netrc.py'] = result
      
      4
          print(result)
      
      executed in 1m 34.9s, finished 05:30:22 2019-08-15
      (773, 1000)
      
      . . .
      1
       
      1
      ### Microjson
      

      2.2.6  Microjson¶

      1
       
      1
      #### Microjson Validation
      

      2.2.6.1  Microjson Validation¶

      1
       
      1
      This is done through `json.tar.gz`
      

      This is done through json.tar.gz

      1
       
      1
      #### Samples
      

      2.2.6.2  Samples¶

      In [297]:
      xxxxxxxxxx
      
      281
       
      1
      # json samples
      
      2
      json_samples = [i.strip().replace('\n', ' ') for i in ['''
      
      3
      {"abcd":[],
      
      4
        "efgh":{"y":[],
      
      5
          "pqrstuv":  null,
      
      6
          "p":  "",
      
      7
          "q":"" ,
      
      8
          "r": "" ,
      
      9
          "float1": 1.0,
      
      10
          "float2":1.0,
      
      11
          "float3":1.0 ,
      
      12
          "float4": 1.0 ,
      
      13
           "_124": {"wx" :  null,
      
      14
           "zzyym!!2@@39": [1.1, 2452, 398, {"x":[[4,53,6,[7  ,8,90 ],10]]}]} }
      
      15
       }
      
      16
      ''',
      
      17
      '''
      
      18
      {"mykey1": [1, 2, 3], "mykey2": null, "mykey":"'`:{}<>&%[]\\\\^~|$'"}
      
      19
      ''','''
      
      20
      {"emptya": [], "emptyh": {}, "emptystr":"", "null":null}
      
      21
      ''', '''
      
      22
      [
      
      23
          "JSON Test Pattern pass1",
      
      24
          {"object with 1 member":["array with 1 element"]},
      
      25
          {},
      
      26
          [],
      
      27
          -42,
      
      28
          true,
      
      29
          false,
      
      30
          null,
      
      31
          {
      
      32
              "integer": 1234567890,
      
      33
              "real": -9876.543210,
      
      34
              "e": 0.123456789e-12,
      
      35
              "E": 1.234567890E+34,
      
      36
              "":  23456789012E66,
      
      37
              "zero": 0,
      
      38
              "one": 1,
      
      39
              "space": " ",
      
      40
              "quote": "\\"",
      
      41
              "backslash": "\\\\",
      
      42
              "controls": "\\b\\f\\n\\r\\t",
      
      43
              "slash": "/ & \\/",
      
      44
              "alpha": "abcdefghijklmnopqrstuvwyz",
      
      45
              "ALPHA": "ABCDEFGHIJKLMNOPQRSTUVWYZ",
      
      46
              "digit": "0123456789",
      
      47
              "0123456789": "digit",
      
      48
              "special": "`1~!@#$%^&*()_+-={':[,]}|;.</>?",
      
      49
              "true": true,
      
      50
              "false": false,
      
      51
              "null": null,
      
      52
              "array":[  ],
      
      53
              "object":{  },
      
      54
              "address": "50 St. James Street",
      
      55
              "url": "http://www.JSON.org/",
      
      56
              "comment": "// /* <!-- --",
      
      57
              "# -- --> */": " ",
      
      58
              " s p a c e d " :[1,2 , 3
      
      59
      ​
      
      60
      ,
      
      61
      ​
      
      62
      4 , 5        ,          6           ,7        ],"compact":[1,2,3,4,5,6,7],
      
      63
              "jsontext": "{\\"object with 1 member\\":[\\"array with 1 element\\"]}",
      
      64
              "quotes": "&#34; %22 0x22 034 &#x22;",
      
      65
              "\\/\\\\\\"\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?"
      
      66
      : "A key can be any string"
      
      67
          },
      
      68
          0.5 ,98.6
      
      69
      ,
      
      70
      99.44
      
      71
      ,
      
      72
      ​
      
      73
      1066,
      
      74
      1e1,
      
      75
      0.1e1,
      
      76
      1e-1,
      
      77
      1e00,2e+00,2e-00
      
      78
      ,"rosebud"]
      
      79
      ''', '''
      
      80
      {"menu":
      
      81
        {
      
      82
          "id": "file",
      
      83
            "value": "File",
      
      84
            "popup": {
      
      85
              "menuitem": [
      
      86
              {"value": "New", "onclick": "CreateNewDoc()"},
      
      87
              {"value": "Open", "onclick": "OpenDoc()"},
      
      88
              {"value": "Close", "onclick": "CloseDoc()"}
      
      89
              ]
      
      90
            }
      
      91
        }
      
      92
      }
      
      93
      ''', '''
      
      94
      {
      
      95
        "XMDEwOlJlcG9zaXRvcnkxODQ2MjA4ODQ=": "-----BEGIN PGP SIGNATURE-----\n\niQIzBAABAQAdFiEESn/54jMNIrGSE6Tp6cQjvhfv7nAFAlnT71cACgkQ6cQjvhfv\n7nCWwA//XVqBKWO0zF+                             bZl6pggvky3Oc2j1pNFuRWZ29LXpNuD5WUGXGG209B0hI\nDkmcGk19ZKUTnEUJV2Xd0R7AW01S/YSub7OYcgBkI7qUE13FVHN5ln1KvH2all2n\n2+JCV1HcJLEoTjqIFZSSu/sMdhkLQ9/NsmMAzpf/           iIM0nQOyU4YRex9eD1bYj6nA\nOQPIDdAuaTQj1gFPHYLzM4zJnCqGdRlg0sOM/zC5apBNzIwlgREatOYQSCfCKV7k\nnrU34X8b9BzQaUx48Qa+Dmfn5KQ8dl27RNeWAqlkuWyv3pUauH9UeYW+KyuJeMkU\n+     NyHgAsWFaCFl23kCHThbLStMZOYEnGagrd0hnm1TPS4GJkV4wfYMwnI4KuSlHKB\njHl3Js9vNzEUQipQJbgCgTiWvRJoK3ENwBTMVkKHaqT4x9U4Jk/                                                XZB6Q8MA09ezJ\n3QgiTjTAGcum9E9QiJqMYdWQPWkaBIRRz5cET6HPB48YNXAAUsfmuYsGrnVLYbG+                                                                                     \nUpC6I97VybYHTy2O9XSGoaLeMI9CsFn38ycAxxbWagk5mhclNTP5mezIq6wKSwmr\nX11FW3n1J23fWZn5HJMBsRnUCgzqzX3871IqLYHqRJ/bpZ4h20RhTyPj5c/z7QXp\neSakNQMfbbMcljkha+            ZMuVQX1K9aRlVqbmv3ZMWh+OijLYVU2bc=\n=5Io4\n-----END PGP SIGNATURE-----\n"
      
      96
      }
      
      97
      ''', '''
      
      98
      {"widget":
      
      99
        {
      
      100
          "debug": "on",
      
      101
            "window": {
      
      102
              "title": "Sample Konfabulator Widget",
      
      103
              "name": "main_window",
      
      104
              "width": 500,
      
      105
              "height": 500
      
      106
            },
      
      107
            "image": {
      
      108
              "src": "Images/Sun.png",
      
      109
              "name": "sun1",
      
      110
              "hOffset": 250,
      
      111
              "vOffset": 250,
      
      112
              "alignment": "center"
      
      113
            },
      
      114
            "text": {
      
      115
              "data": "Click Here",
      
      116
              "size": 36,
      
      117
              "style": "bold",
      
      118
              "name": "text1",
      
      119
              "hOffset": 250,
      
      120
              "vOffset": 100,
      
      121
              "alignment": "center",
      
      122
              "onMouseUp": "sun1.opacity = (sun1.opacity / 100) * 90;"
      
      123
            }
      
      124
        }
      
      125
      }
      
      126
      ''',
      
      127
      '''
      
      128
      {
      
      129
          "fruit": "Apple",
      
      130
          "size": "Large",
      
      131
          "color": "Red",
      
      132
          "product": "Jam"
      
      133
      }
      
      134
      ''',
      
      135
      '''
      
      136
      {"menu":
      
      137
        {
      
      138
          "header": "SVG Viewer",
      
      139
            "items": [
      
      140
            {"id": "Open"},
      
      141
            {"id": "OpenNew", "label": "Open New"},
      
      142
            null,
      
      143
            {"id": "ZoomIn", "label": "Zoom In"},
      
      144
            {"id": "ZoomOut", "label": "Zoom Out"},
      
      145
            {"id": "OriginalView", "label": "Original View"},
      
      146
            null,
      
      147
            {"id": "Quality"},
      
      148
            {"id": "Pause"},
      
      149
            {"id": "Mute"},
      
      150
            null,
      
      151
            {"id": "Find", "label": "Find..."},
      
      152
            {"id": "FindAgain", "label": "Find Again"},
      
      153
            {"id": "Copy"},
      
      154
            {"id": "CopyAgain", "label": "Copy Again"},
      
      155
            {"id": "CopySVG", "label": "Copy SVG"},
      
      156
            {"id": "ViewSVG", "label": "View SVG"},
      
      157
            {"id": "ViewSource", "label": "View Source"},
      
      158
            {"id": "SaveAs", "label": "Save As"},
      
      159
            null,
      
      160
            {"id": "Help"},
      
      161
            {"id": "About", "label": "About Adobe CVG Viewer..."}
      
      162
          ]
      
      163
        }}
      
      164
      ''',
      
      165
      '''
      
      166
      {
      
      167
          "quiz": {
      
      168
              "sport": {
      
      169
                  "q1": {
      
      170
                      "question": "Which one is correct team name in NBA?",
      
      171
                      "options": [
      
      172
                          "New York Bulls",
      
      173
                          "Los Angeles Kings",
      
      174
                          "Golden State Warriros",
      
      175
                          "Huston Rocket"
      
      176
                      ],
      
      177
                      "answer": "Huston Rocket"
      
      178
                  }
      
      179
              },
      
      180
              "maths": {
      
      181
                  "q1": {
      
      182
                      "question": "5 + 7 = ?",
      
      183
                      "options": [
      
      184
                          "10",
      
      185
                          "11",
      
      186
                          "12",
      
      187
                          "13"
      
      188
                      ],
      
      189
                      "answer": "12"
      
      190
                  },
      
      191
                  "q2": {
      
      192
                      "question": "12 - 8 = ?",
      
      193
                      "options": [
      
      194
                          "1",
      
      195
                          "2",
      
      196
                          "3",
      
      197
                          "4"
      
      198
                      ],
      
      199
                      "answer": "4"
      
      200
                  }
      
      201
              }
      
      202
          }
      
      203
      }
      
      204
      ''',
      
      205
      '''
      
      206
      {
      
      207
        "colors":
      
      208
        [
      
      209
          {
      
      210
            "color": "black",
      
      211
            "category": "hue",
      
      212
            "type": "primary",
      
      213
            "code": {
      
      214
              "rgba": [255,255,255,1],
      
      215
              "hex": "#000"
      
      216
            }
      
      217
          },
      
      218
          {
      
      219
            "color": "white",
      
      220
            "category": "value",
      
      221
            "code": {
      
      222
              "rgba": [0,0,0,1],
      
      223
              "hex": "#FFF"
      
      224
            }
      
      225
          },
      
      226
          {
      
      227
            "color": "red",
      
      228
            "category": "hue",
      
      229
            "type": "primary",
      
      230
            "code": {
      
      231
              "rgba": [255,0,0,1],
      
      232
              "hex": "#FF0"
      
      233
            }
      
      234
          },
      
      235
          {
      
      236
            "color": "blue",
      
      237
            "category": "hue",
      
      238
            "type": "primary",
      
      239
            "code": {
      
      240
              "rgba": [0,0,255,1],
      
      241
              "hex": "#00F"
      
      242
            }
      
      243
          },
      
      244
          {
      
      245
            "color": "yellow",
      
      246
            "category": "hue",
      
      247
            "type": "primary",
      
      248
            "code": {
      
      249
              "rgba": [255,255,0,1],
      
      250
              "hex": "#FF0"
      
      251
            }
      
      252
          },
      
      253
          {
      
      254
            "color": "green",
      
      255
            "category": "hue",
      
      256
            "type": "secondary",
      
      257
            "code": {
      
      258
              "rgba": [0,255,0,1],
      
      259
              "hex": "#0F0"
      
      260
            }
      
      261
          }
      
      262
        ]
      
      263
      }
      
      264
      ''',
      
      265
      '''
      
      266
      {
      
      267
        "aliceblue": "#f0f8ff",
      
      268
        "antiquewhite": "#faebd7",
      
      269
        "aqua": "#00ffff",
      
      270
        "aquamarine": "#7fffd4",
      
      271
        "azure": "#f0ffff",
      
      272
        "beige": "#f5f5dc",
      
      273
        "bisque": "#ffe4c4",
      
      274
        "black": "#000000",
      
      275
        "blanchedalmond": "#ffebcd",
      
      276
        "blue": "#0000ff",
      
      277
        "blueviolet": "#8a2be2",
      
      278
        "brown": "#a52a2a",
      
      279
        "majenta": "#ff0ff"
      
      280
      }
      
      281
      ''']]
      
      executed in 12ms, finished 05:30:22 2019-08-15
      . . .
      1
       
      1
      #### Mimid
      

      2.2.6.3  Mimid¶

      In [298]:
      xxxxxxxxxx
      
      4
       
      1
      %%time
      
      2
      with timeit() as t:
      
      3
          microjson_grammar = accio_grammar('microjson.py', VARS['microjson_src'], json_samples)
      
      4
      Mimid_t['microjson.py'] = t.runtime
      
      executed in 6m 48s, finished 05:37:10 2019-08-15
      CPU times: user 1min 39s, sys: 19.1 s, total: 1min 58s
      Wall time: 6min 47s
      
      . . .
      In [299]:
      xxxxxxxxxx
      
      1
       
      1
      save_grammar(microjson_grammar, 'mimid', 'microjson')
      
      executed in 11ms, finished 05:37:10 2019-08-15
      Out[299]:
      {'<START>': ['<_from_json_raw>'],
       '<_from_json_raw>': ['<_from_json_number-1>',
        '<_from_json_raw-3>',
        '<_from_json_raw-4>',
        '<_from_json_raw-5>',
        '<_skip-1-s><_from_json_raw-2>',
        'false',
        'null',
        'true'],
       '<_from_json_number-1>': ['<_from_json_number-1-s>',
        '<_from_json_number-1-s>e<_from_json_number-3-s>'],
       '<_from_json_raw-3>': ['[<_from_json_list-0-c>'],
       '<_from_json_raw-4>': ['{<_from_json_dict-0-c>'],
       '<_from_json_raw-5>': ['"<_from_json_string-0-c>'],
       '<_skip-1-s>': [' ', ' <_skip-1-s>'],
       '<_from_json_raw-2>': ['<_from_json_number-1>',
        '<_from_json_raw-3>',
        '<_from_json_raw-4>',
        '<_from_json_raw-5>',
        'false',
        'null',
        'true'],
       '<_from_json_number-1-s>': ['<_from_json_number>',
        '<_from_json_number><_from_json_number-1-s>'],
       '<_from_json_number-3-s>': ['<_from_json_number>',
        '<_from_json_number><_from_json_number-3-s>'],
       '<_from_json_number>': ['+',
        '-',
        '.',
        '0',
        '1',
        '2',
        '3',
        '4',
        '5',
        '6',
        '7',
        '8',
        '9',
        'E',
        'e'],
       '<_from_json_list-0-c>': ['<_from_json_list-10><_from_json_list-1-c>',
        '<_from_json_list-12>',
        '<_from_json_list-5-s><_from_json_list-6-s><_from_json_list-6-c>'],
       '<_from_json_list-10>': ['<_from_json_raw>', '<_skip-1-s><_from_json_raw>'],
       '<_from_json_list-1-c>': ['<_from_json_list-12>',
        '<_from_json_list-2-s><_from_json_list-12>'],
       '<_from_json_list-12>': ['<_skip-1-s>]', ']'],
       '<_from_json_list-5-s>': ['<_from_json_list-7>',
        '<_from_json_list-7><_from_json_list-5-s>'],
       '<_from_json_list-6-s>': ['<_from_json_list>',
        '<_from_json_list><_from_json_list-6-s>'],
       '<_from_json_list-6-c>': ['<_from_json_list-12>',
        '<_from_json_list-8-s><_from_json_list-9-s><_from_json_list-12>'],
       '<_from_json_list-2-s>': ['<_from_json_list-7>',
        '<_from_json_list-7><_from_json_list-2-s>'],
       '<_from_json_list-7>': ['<_from_json_list-3>',
        '<_from_json_list-4>',
        '<_from_json_raw>'],
       '<_from_json_list-3>': [',<_from_json_raw>'],
       '<_from_json_list-4>': ['<_skip-1-s><_from_json_list-3>'],
       '<_from_json_list>': ['<_from_json_list-3>', '<_from_json_list-4>'],
       '<_from_json_list-8-s>': ['<_from_json_list-7>',
        '<_from_json_list-7><_from_json_list-8-s>'],
       '<_from_json_list-9-s>': ['<_from_json_list>',
        '<_from_json_list><_from_json_list-9-s>'],
       '<_from_json_dict-0-c>': ['<_from_json_dict-1>',
        '<_from_json_dict-3-s><_from_json_dict-1>',
        '<_from_json_dict-5>'],
       '<_from_json_dict-1>': ['<_from_json_dict><_from_json_dict-5>'],
       '<_from_json_dict-3-s>': ['<_from_json_dict><_from_json_dict-7>',
        '<_from_json_dict><_from_json_dict-7><_from_json_dict-3-s>'],
       '<_from_json_dict-5>': ['<_skip-1-s>}', '}'],
       '<_from_json_dict>': ['<_from_json_dict-4>',
        '<_skip-1-s><_from_json_dict-4>'],
       '<_from_json_dict-4>': ['"<_from_json_string-0-c><_from_json_dict-10>'],
       '<_from_json_string-0-c>': ['"', '<_from_json_string-1-s>"'],
       '<_from_json_dict-10>': ['<_from_json_dict-12>',
        '<_skip-1-s><_from_json_dict-12>'],
       '<_from_json_string-1-s>': ['<_from_json_string>',
        '<_from_json_string><_from_json_string-1-s>'],
       '<_from_json_string>': [' ',
        '!',
        '#',
        '$',
        '%',
        '&',
        "'",
        '(',
        ')',
        '*',
        '+',
        ',',
        '-',
        '.',
        '/',
        '0',
        '1',
        '2',
        '3',
        '4',
        '5',
        '6',
        '7',
        '8',
        '9',
        ':',
        ';',
        '<',
        '=',
        '>',
        '?',
        '@',
        'A',
        'B',
        'C',
        'D',
        'E',
        'F',
        'G',
        'H',
        'I',
        'J',
        'K',
        'L',
        'M',
        'N',
        'O',
        'P',
        'Q',
        'R',
        'S',
        'T',
        'U',
        'V',
        'W',
        'X',
        'Y',
        'Z',
        '[',
        '\\<decode_escape-0-c>',
        ']',
        '^',
        '_',
        '`',
        'a',
        'b',
        'c',
        'd',
        'e',
        'f',
        'g',
        'h',
        'i',
        'j',
        'k',
        'l',
        'm',
        'n',
        'o',
        'p',
        'q',
        'r',
        's',
        't',
        'u',
        'v',
        'w',
        'x',
        'y',
        'z',
        '{',
        '|',
        '}',
        '~'],
       '<decode_escape-0-c>': ['"', '/', '\\', 'b', 'f', 'n', 'r', 't'],
       '<_from_json_dict-12>': [':<_from_json_list-10>'],
       '<_from_json_dict-7>': [',', '<_skip-1-s>,']}
      
      . . .
      In [300]:
      xxxxxxxxxx
      
      4
       
      1
      if 'microjson' in CHECK:
      
      2
          result = check_precision('microjson.py', microjson_grammar)
      
      3
          Mimid_p['microjson.py'] = result
      
      4
          print(result)
      
      executed in 27.2s, finished 05:37:37 2019-08-15
      (924, 1000)
      
      . . .
      In [301]:
      xxxxxxxxxx
      
      1
       
      1
      import subjects.microjson
      
      executed in 10ms, finished 05:37:37 2019-08-15
      . . .
      In [302]:
      xxxxxxxxxx
      
      1
       
      1
      import pathlib
      
      executed in 5ms, finished 05:37:37 2019-08-15
      . . .
      In [303]:
      xxxxxxxxxx
      
      4
       
      1
      def slurp(fn):
      
      2
          with open(fn) as f:
      
      3
              s = f.read()
      
      4
              return s.replace('\n', ' ').strip()
      
      executed in 6ms, finished 05:37:37 2019-08-15
      . . .
      In [304]:
      xxxxxxxxxx
      
      6
       
      1
      if shutil.which('gzcat'):
      
      2
          !gzcat json.tar.gz | tar -xpf -
      
      3
      elif shutil.which('zcat'):
      
      4
          !zcat json.tar.gz | tar -xpf -
      
      5
      else:
      
      6
          assert False
      
      executed in 179ms, finished 05:37:38 2019-08-15
      . . .
      In [305]:
      xxxxxxxxxx
      
      3
       
      1
      json_path = pathlib.Path('recall/json')
      
      2
      json_files =  [i.as_posix() for i in json_path.glob('**/*.json')]
      
      3
      json_samples_2 = [slurp(i) for i in json_files]
      
      executed in 15ms, finished 05:37:38 2019-08-15
      . . .
      In [306]:
      xxxxxxxxxx
      
      20
       
      1
      def check_recall_samples(samples, my_grammar, validator, log=False):
      
      2
          n_max = len(samples)
      
      3
          ie = IterativeEarleyParser(my_grammar, start_symbol='<START>')
      
      4
          my_samples = list(samples)
      
      5
          count = 0
      
      6
          while my_samples:
      
      7
              src, *my_samples = my_samples
      
      8
              try:
      
      9
                  validator(src)
      
      10
                  try:
      
      11
                      # JSON files are much larger because they are from real world
      
      12
                      for tree in ie.parse(src):
      
      13
                          count += 1
      
      14
                          break
      
      15
                      if log: print('+', repr(src), count, file=sys.stderr)
      
      16
                  except:
      
      17
                      if log: print('-', repr(src), file=sys.stderr)
      
      18
              except:
      
      19
                  pass
      
      20
          return (count, n_max)
      
      executed in 10ms, finished 05:37:38 2019-08-15
      . . .
      In [307]:
      xxxxxxxxxx
      
      4
       
      1
      if 'microjson' in CHECK:
      
      2
          result = check_recall_samples(json_samples_2, microjson_grammar, subjects.microjson.main)
      
      3
          Mimid_r['microjson.py'] = result
      
      4
          print(result)
      
      executed in 29m 6s, finished 06:06:44 2019-08-15
      (93, 100)
      
      . . .
      1
       
      1
      #### Autogram
      

      2.2.6.4  Autogram¶

      In [308]:
      xxxxxxxxxx
      
      4
       
      1
      %%time
      
      2
      with timeit() as t:
      
      3
          autogram_microjson_grammar_t = recover_grammar_with_taints('microjson.py', VARS['microjson_src'], json_samples)
      
      4
      Autogram_t['microjson.py'] = t.runtime
      
      executed in 7m 31s, finished 06:14:15 2019-08-15
      CPU times: user 16.4 ms, sys: 16.6 ms, total: 33 ms
      Wall time: 7min 30s
      
      . . .
      In [309]:
      xxxxxxxxxx
      
      1
       
      1
      save_grammar(autogram_microjson_grammar_t, 'autogram_t', 'microjson')
      
      executed in 13ms, finished 06:14:15 2019-08-15
      Out[309]:
      {'<START>': ['<tell@115:self.buf>'],
       '<tell@115:self.buf>': ['<_skip@76:c>     "JSON Test Pattern pass1",     {"object with 1 member":<_from_json_raw@283:c>"array with 1 element"]},     {},     [],     -42,     true,     false,     null,     {         "integer": 1234567890,         "real": -9876.543210,         "e": 0.123456789e-12,         "E": 1.234567890E+34,         "":  23456789012E66,         "zero": 0,         "one": 1,         "space": " ",         "quote": "\\"",         "backslash": "\\\\",         "controls": "\\b\\f\\n\\r\\t",         "slash": "/ & \\/",         "alpha": "abcdefghijklmnopqrstuvwyz",         "ALPHA": "ABCDEFGHIJKLMNOPQRSTUVWYZ",         "digit": "0123456789",         "0123456789": "digit",         "special": "`1~!@#$%^&*()_+-={\':[,]}|;.<openA>/<closeA>?",         "true": true,         "false": false,         "null": null,         "array":[  ],         "object":{  },         "address": "50 St. James Street",         "url": "http://www.JSON.org/",         "comment": "// /* <!-- --",         "# -- --> */": " ",         " s p a c e d " :[1,2 , 3  ,  4 , 5        ,          6           ,7        ],"compact":[1,2,3,4,5,6,7],         "jsontext": "{\\"object with 1 member\\":[\\"array with 1 element\\"]}",         "quotes": "&#34; %22 0x22 034 &#x22;",         "\\/\\\\\\"\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:\',./<openA><closeA>?" : "A key can be any string"     },     0.5 ,98.6 , 99.44 ,  1066, 1e1, 0.1e1, 1e-1, 1e00,2e+00,2e-00 ,"rosebud"]',
        '<_skip@76:c>     "fruit": "<from_json@313:v.fruit>",     "size": "<from_json@313:v.size>",     "color": "<from_json@313:v.color>",     "product": "<from_json@313:v.product>" }',
        '<_skip@76:c>     "quiz": <_from_json_raw@283:c>         "sport": {             "q1": {                 "question": "<from_json@313:v.quiz.sport.q1.question>",                 "options": [                     "New York Bulls",                     "Los Angeles Kings",                     "Golden State Warriros",                     "<from_json@313:v.quiz.sport.q1.answer>"                 ],                 "answer": "Huston Rocket"             }         },         "maths": {             "q1": {                 "question": "<from_json@313:v.quiz.maths.q1.question>",                 "options": [                     "10",                     "11",                     "<from_json@313:v.quiz.maths.q1.answer>",                     "13"                 ],                 "answer": "12"             },             "q2": {                 "question": "<from_json@313:v.quiz.maths.q2.question>",                 "options": [                     "1",                     "2",                     "3",                     "<from_json@313:v.quiz.maths.q2.answer>"                 ],                 "answer": "4"             }         }     } }',
        '<_skip@76:c>   "XMDEwOlJlcG9zaXRvcnkxODQ2MjA4ODQ=": "<from_json@313:v.xmdewoljlcg9zaxrvcnkxodq2mja4odq=>" }',
        '<_skip@76:c>   "aliceblue": "<from_json@313:v.aliceblue>",   "antiquewhite": "<from_json@313:v.antiquewhite>",   "aqua": "<from_json@313:v.aqua>",   "aquamarine": "<from_json@313:v.aquamarine>",   "azure": "<from_json@313:v.azure>",   "beige": "<from_json@313:v.beige>",   "bisque": "<from_json@313:v.bisque>",   "black": "<from_json@313:v.black>",   "blanchedalmond": "<from_json@313:v.blanchedalmond>",   "blue": "<from_json@313:v.blue>",   "blueviolet": "<from_json@313:v.blueviolet>",   "brown": "<from_json@313:v.brown>",   "majenta": "<from_json@313:v.majenta>" }',
        '<_skip@76:c>   "colors":   [     <_from_json_raw@283:c>       "color": "black",       "category": "hue",       "type": "primary",       "code": {         "rgba": [255,255,255,1],         "hex": "#000"       }     },     {       "color": "white",       "category": "value",       "code": {         "rgba": [0,0,0,1],         "hex": "#FFF"       }     },     {       "color": "red",       "category": "hue",       "type": "primary",       "code": {         "rgba": [255,0,0,1],         "hex": "#FF0"       }     },     {       "color": "blue",       "category": "hue",       "type": "primary",       "code": {         "rgba": [0,0,255,1],         "hex": "#00F"       }     },     {       "color": "yellow",       "category": "hue",       "type": "primary",       "code": {         "rgba": [255,255,0,1],         "hex": "#FF0"       }     },     {       "color": "green",       "category": "hue",       "type": "secondary",       "code": {         "rgba": [0,255,0,1],         "hex": "#0F0"       }     }   ] }',
        '<_skip@76:c>"abcd":[],   "efgh":<_from_json_raw@283:c>"y":[],     "pqrstuv":  <from_json@313:v.efgh._124.wx>,     "p":  "",     "q":"" ,     "r": "" ,     "float1": <from_json@313:v.efgh.float4>,     "float2":1.0,     "float3":1.0 ,     "float4": 1.0 ,      "_124": {"wx" :  null,      "zzyym!!2@@39": [1.1, 2452, 398, {"x":[[4,53,6,[7  ,8,90 ],10]]}]} }  }',
        '<_skip@76:c>"emptya": [], "emptyh": <_from_json_raw@283:c>}, "emptystr":"", "<from_json@313:v.null>":null}',
        '<_skip@76:c>"menu":   <_from_json_raw@283:c>     "header": "<from_json@313:v.menu.header>",       "items": [       {"id": "Open"},       {"id": "OpenNew", "label": "Open New"},       null,       {"id": "ZoomIn", "label": "Zoom In"},       {"id": "ZoomOut", "label": "Zoom Out"},       {"id": "OriginalView", "label": "Original View"},       null,       {"id": "Quality"},       {"id": "Pause"},       {"id": "Mute"},       null,       {"id": "Find", "label": "Find..."},       {"id": "FindAgain", "label": "Find Again"},       {"id": "Copy"},       {"id": "CopyAgain", "label": "Copy Again"},       {"id": "CopySVG", "label": "Copy SVG"},       {"id": "ViewSVG", "label": "View SVG"},       {"id": "ViewSource", "label": "View Source"},       {"id": "SaveAs", "label": "Save As"},       null,       {"id": "Help"},       {"id": "About", "label": "About Adobe CVG Viewer..."}     ]   }}',
        '<_skip@76:c>"menu":   <_from_json_raw@283:c>     "id": "<from_json@313:v.menu.id>",       "value": "<from_json@313:v.menu.value>",       "popup": {         "menuitem": [         {"value": "New", "onclick": "CreateNewDoc()"},         {"value": "Open", "onclick": "OpenDoc()"},         {"value": "Close", "onclick": "CloseDoc()"}         ]       }   } }',
        '<_skip@76:c>"mykey1": [1, 2, 3], "mykey2": <from_json@313:v.mykey2>, "mykey":"\'`:<_from_json_raw@283:c>}<openA><closeA>&%[]\\\\^~|$\'"}',
        '<_skip@76:c>"widget":   <_from_json_raw@283:c>     "debug": "<from_json@313:v.widget.debug>",       "window": {         "title": "<from_json@313:v.widget.window.title>",         "name": "<from_json@313:v.widget.window.name>",         "width": <from_json@313:v.widget.window.height>,         "height": 500       },       "image": {         "src": "<from_json@313:v.widget.image.src>",         "name": "<from_json@313:v.widget.image.name>",         "hOffset": <from_json@313:v.widget.text.hoffset>,         "vOffset": 250,         "alignment": "<from_json@313:v.widget.text.alignment>"       },       "text": {         "data": "<from_json@313:v.widget.text.data>",         "size": <from_json@313:v.widget.text.size>,         "style": "<from_json@313:v.widget.text.style>",         "name": "<from_json@313:v.widget.text.name>",         "hOffset": 250,         "vOffset": <from_json@313:v.widget.text.voffset>,         "alignment": "center",         "onMouseUp": "<from_json@313:v.widget.text.onmouseup>"       }   } }'],
       '<_skip@76:c>': ['<<openA>lambda<closeA>@72:c>'],
       '<_from_json_raw@283:c>': ['[', '{'],
       '<openA>': ['<'],
       '<closeA>': ['<'],
       '<from_json@313:v.fruit>': ['Apple'],
       '<from_json@313:v.size>': ['Large'],
       '<from_json@313:v.color>': ['Red'],
       '<from_json@313:v.product>': ['Jam'],
       '<from_json@313:v.quiz.sport.q1.question>': ['Which one is correct team name in NBA?'],
       '<from_json@313:v.quiz.sport.q1.answer>': ['Huston Rocket'],
       '<from_json@313:v.quiz.maths.q1.question>': ['5 + 7 = ?'],
       '<from_json@313:v.quiz.maths.q1.answer>': ['12'],
       '<from_json@313:v.quiz.maths.q2.question>': ['12 - 8 = ?'],
       '<from_json@313:v.quiz.maths.q2.answer>': ['4'],
       '<from_json@313:v.xmdewoljlcg9zaxrvcnkxodq2mja4odq=>': ['-----BEGIN PGP SIGNATURE-----  iQIzBAABAQAdFiEESn/54jMNIrGSE6Tp6cQjvhfv7nAFAlnT71cACgkQ6cQjvhfv 7nCWwA//XVqBKWO0zF+                             bZl6pggvky3Oc2j1pNFuRWZ29LXpNuD5WUGXGG209B0hI DkmcGk19ZKUTnEUJV2Xd0R7AW01S/YSub7OYcgBkI7qUE13FVHN5ln1KvH2all2n 2+JCV1HcJLEoTjqIFZSSu/sMdhkLQ9/NsmMAzpf/           iIM0nQOyU4YRex9eD1bYj6nA OQPIDdAuaTQj1gFPHYLzM4zJnCqGdRlg0sOM/zC5apBNzIwlgREatOYQSCfCKV7k nrU34X8b9BzQaUx48Qa+Dmfn5KQ8dl27RNeWAqlkuWyv3pUauH9UeYW+KyuJeMkU +     NyHgAsWFaCFl23kCHThbLStMZOYEnGagrd0hnm1TPS4GJkV4wfYMwnI4KuSlHKB jHl3Js9vNzEUQipQJbgCgTiWvRJoK3ENwBTMVkKHaqT4x9U4Jk/                                                XZB6Q8MA09ezJ 3QgiTjTAGcum9E9QiJqMYdWQPWkaBIRRz5cET6HPB48YNXAAUsfmuYsGrnVLYbG+                                                                                      UpC6I97VybYHTy2O9XSGoaLeMI9CsFn38ycAxxbWagk5mhclNTP5mezIq6wKSwmr X11FW3n1J23fWZn5HJMBsRnUCgzqzX3871IqLYHqRJ/bpZ4h20RhTyPj5c/z7QXp eSakNQMfbbMcljkha+            ZMuVQX1K9aRlVqbmv3ZMWh+OijLYVU2bc= =5Io4 -----END PGP SIGNATURE----- '],
       '<from_json@313:v.aliceblue>': ['#f0f8ff'],
       '<from_json@313:v.antiquewhite>': ['#faebd7'],
       '<from_json@313:v.aqua>': ['#00ffff'],
       '<from_json@313:v.aquamarine>': ['#7fffd4'],
       '<from_json@313:v.azure>': ['#f0ffff'],
       '<from_json@313:v.beige>': ['#f5f5dc'],
       '<from_json@313:v.bisque>': ['#ffe4c4'],
       '<from_json@313:v.black>': ['#000000'],
       '<from_json@313:v.blanchedalmond>': ['#ffebcd'],
       '<from_json@313:v.blue>': ['#0000ff'],
       '<from_json@313:v.blueviolet>': ['#8a2be2'],
       '<from_json@313:v.brown>': ['#a52a2a'],
       '<from_json@313:v.majenta>': ['#ff0ff'],
       '<from_json@313:v.efgh._124.wx>': ['null'],
       '<from_json@313:v.efgh.float4>': ['1.0'],
       '<from_json@313:v.null>': ['null'],
       '<from_json@313:v.menu.header>': ['SVG Viewer'],
       '<from_json@313:v.menu.id>': ['file'],
       '<from_json@313:v.menu.value>': ['File'],
       '<from_json@313:v.mykey2>': ['null'],
       '<from_json@313:v.widget.debug>': ['on'],
       '<from_json@313:v.widget.window.title>': ['Sample Konfabulator Widget'],
       '<from_json@313:v.widget.window.name>': ['main_window'],
       '<from_json@313:v.widget.window.height>': ['500'],
       '<from_json@313:v.widget.image.src>': ['Images/Sun.png'],
       '<from_json@313:v.widget.image.name>': ['sun1'],
       '<from_json@313:v.widget.text.hoffset>': ['250'],
       '<from_json@313:v.widget.text.alignment>': ['center'],
       '<from_json@313:v.widget.text.data>': ['Click Here'],
       '<from_json@313:v.widget.text.size>': ['36'],
       '<from_json@313:v.widget.text.style>': ['bold'],
       '<from_json@313:v.widget.text.name>': ['text1'],
       '<from_json@313:v.widget.text.voffset>': ['100'],
       '<from_json@313:v.widget.text.onmouseup>': ['sun1.opacity = (sun1.opacity / 100) * 90;']}
      
      . . .
      In [310]:
      xxxxxxxxxx
      
      4
       
      1
      if 'microjson' in CHECK:
      
      2
          result = check_precision('microjson.py', autogram_microjson_grammar_t)
      
      3
          Autogram_p['microjson.py'] = result
      
      4
          print(result)
      
      executed in 1.82s, finished 06:14:17 2019-08-15
      (0, 1000)
      
      . . .
      In [311]:
      xxxxxxxxxx
      
      4
       
      1
      if 'microjson' in CHECK:
      
      2
          result = check_recall_samples(json_samples_2, autogram_microjson_grammar_t, subjects.microjson.main)
      
      3
          Autogram_r['microjson.py'] = result
      
      4
          print(result)
      
      executed in 14.8s, finished 06:14:31 2019-08-15
      (0, 100)
      
      . . .
      1
       
      1
      ## Results
      

      2.3  Results¶

      1
       
      1
      Note that we found and fixed a bug in the Information flow chapter of the fuzzingbook, which was causing Autogram to fail (See `flatten` and `ostr_new` in `recover_grammar_with_taints`). This caused the precision numbers of Autogram to improve. However, please see the grammars generated. They are still enumerations.
      

      Note that we found and fixed a bug in the Information flow chapter of the fuzzingbook, which was causing Autogram to fail (See flatten and ostr_new in recover_grammar_with_taints). This caused the precision numbers of Autogram to improve. However, please see the grammars generated. They are still enumerations.

      In [312]:
      xxxxxxxxxx
      
      1
       
      1
      from IPython.display import HTML, display
      
      executed in 36ms, finished 06:14:31 2019-08-15
      . . .
      In [313]:
      xxxxxxxxxx
      
      9
       
      1
      def show_table(keys, autogram, mimid, title):
      
      2
          keys = [k for k in keys if k in autogram and k in mimid and autogram[k] and mimid[k]]
      
      3
          tbl = ['<tr>%s</tr>' % ''.join(["<th>%s</th>" % k for k in ['<b>%s</b>' % title,'Autogram', 'Mimid']])]
      
      4
          for k in keys:
      
      5
              h_c = "<td>%s</td>" % k
      
      6
              a_c = "<td>%s</td>" % autogram.get(k,('',0))[0]
      
      7
              m_c = "<td>%s</td>" % mimid.get(k,('',0))[0]
      
      8
              tbl.append('<tr>%s</tr>' % ''.join([h_c, a_c, m_c]))
      
      9
          return display(HTML('<table>%s</table>' % '\n'.join(tbl)))
      
      executed in 39ms, finished 06:14:31 2019-08-15
      . . .
      In [314]:
      xxxxxxxxxx
      
      2
       
      1
      def to_sec(hm):
      
      2
          return {k:((hm[k][1]).seconds, ' ') for k in hm if hm[k]}
      
      executed in 32ms, finished 06:14:31 2019-08-15
      . . .
      1
       
      1
      ### Table II (Time in Seconds)
      

      2.3.1  Table II (Time in Seconds)¶

      In [315]:
      xxxxxxxxxx
      
      1
       
      1
      Autogram_t
      
      executed in 41ms, finished 06:14:32 2019-08-15
      Out[315]:
      {'calculator.py': (552209, datetime.timedelta(seconds=6, microseconds=552209)),
       'mathexpr.py': (538847, datetime.timedelta(seconds=26, microseconds=538847)),
       'urlparse.py': (997539, datetime.timedelta(seconds=47, microseconds=997539)),
       'netrc.py': (167983, datetime.timedelta(seconds=179, microseconds=167983)),
       'cgidecode.py': (905319, datetime.timedelta(seconds=24, microseconds=905319)),
       'microjson.py': (656220,
        datetime.timedelta(seconds=450, microseconds=656220))}
      
      . . .
      In [316]:
      xxxxxxxxxx
      
      1
       
      1
      Mimid_t
      
      executed in 10ms, finished 06:14:32 2019-08-15
      Out[316]:
      {'calculator.py': (718471, datetime.timedelta(seconds=6, microseconds=718471)),
       'mathexpr.py': (21477, datetime.timedelta(seconds=17, microseconds=21477)),
       'urlparse.py': (125374, datetime.timedelta(seconds=6, microseconds=125374)),
       'netrc.py': (384119, datetime.timedelta(seconds=30, microseconds=384119)),
       'cgidecode.py': (833595, datetime.timedelta(seconds=31, microseconds=833595)),
       'microjson.py': (952269,
        datetime.timedelta(seconds=407, microseconds=952269))}
      
      . . .
      In [317]:
      xxxxxxxxxx
      
      1
       
      1
      show_table(Autogram_t.keys(), to_sec(Autogram_t), to_sec(Mimid_t), 'Timing')
      
      executed in 21ms, finished 06:14:32 2019-08-15
      Timing Autogram Mimid
      calculator.py 6 6
      mathexpr.py 26 17
      urlparse.py 47 6
      netrc.py 179 30
      cgidecode.py 24 31
      microjson.py 450 407
      . . .
      4
       
      1
      ### Table III (Precision)
      
      2
      How many inputs we generate using our inferred grammar is valid? (accepted by the subject program?)
      
      3
      ​
      
      4
      Note that the paper reports precision per 100 inputs. We have increased the count to 1000.
      

      2.3.2  Table III (Precision)¶

      How many inputs we generate using our inferred grammar is valid? (accepted by the subject program?)

      Note that the paper reports precision per 100 inputs. We have increased the count to 1000.

      In [318]:
      xxxxxxxxxx
      
      1
       
      1
      Autogram_p
      
      executed in 41ms, finished 06:14:32 2019-08-15
      Out[318]:
      {'calculator.py': (395, 1000),
       'mathexpr.py': (301, 1000),
       'urlparse.py': (1000, 1000),
       'netrc.py': (30, 1000),
       'cgidecode.py': (460, 1000),
       'microjson.py': (0, 1000)}
      
      . . .
      In [319]:
      xxxxxxxxxx
      
      1
       
      1
      Mimid_p
      
      executed in 7ms, finished 06:14:32 2019-08-15
      Out[319]:
      {'calculator.py': (1000, 1000),
       'mathexpr.py': (699, 1000),
       'urlparse.py': (1000, 1000),
       'netrc.py': (773, 1000),
       'cgidecode.py': (1000, 1000),
       'microjson.py': (924, 1000)}
      
      . . .
      In [320]:
      xxxxxxxxxx
      
      1
       
      1
      show_table(Autogram_p.keys(), Autogram_p, Mimid_p, 'Precision')
      
      executed in 11ms, finished 06:14:32 2019-08-15
      Precision Autogram Mimid
      calculator.py 395 1000
      mathexpr.py 301 699
      urlparse.py 1000 1000
      netrc.py 30 773
      cgidecode.py 460 1000
      microjson.py 0 924
      . . .
      5
       
      1
      ### Table IV (Recall)
      
      2
      ​
      
      3
      How many *valid* inputs generated by the golden grammar or collected externally are parsable by a parser using our grammar?
      
      4
      ​
      
      5
      Note that the recall is reported per 100 inputs in paper. We have increased the count to 1000. For Microjson, the recall numbers are based on 100 realworld documents. These are available in json.tar.gz that is bundled along with this notebook.
      

      2.3.3  Table IV (Recall)¶

      How many valid inputs generated by the golden grammar or collected externally are parsable by a parser using our grammar?

      Note that the recall is reported per 100 inputs in paper. We have increased the count to 1000. For Microjson, the recall numbers are based on 100 realworld documents. These are available in json.tar.gz that is bundled along with this notebook.

      In [321]:
      xxxxxxxxxx
      
      1
       
      1
      Autogram_r
      
      executed in 8ms, finished 06:14:32 2019-08-15
      Out[321]:
      {'calculator.py': (1, 1000),
       'mathexpr.py': (0, 1000),
       'urlparse.py': (277, 1000),
       'netrc.py': (773, 1000),
       'cgidecode.py': (380, 1000),
       'microjson.py': (0, 100)}
      
      . . .
      In [322]:
      xxxxxxxxxx
      
      1
       
      1
      Mimid_r
      
      executed in 6ms, finished 06:14:32 2019-08-15
      Out[322]:
      {'calculator.py': (1000, 1000),
       'mathexpr.py': (922, 1000),
       'urlparse.py': (153, 1000),
       'netrc.py': (949, 1000),
       'cgidecode.py': (1000, 1000),
       'microjson.py': (93, 100)}
      
      . . .
      In [323]:
      xxxxxxxxxx
      
      1
       
      1
      show_table(Autogram_p.keys(), Autogram_r, Mimid_r, 'Recall')
      
      executed in 7ms, finished 06:14:32 2019-08-15
      Recall Autogram Mimid
      calculator.py 1 1000
      mathexpr.py 0 922
      urlparse.py 277 153
      netrc.py 773 949
      cgidecode.py 380 1000
      microjson.py 0 93
      . . .
      1
       
      1
      ## Using a Recognizer (not a Parser)
      

      2.4  Using a Recognizer (not a Parser)¶

      In [324]:
      xxxxxxxxxx
      
      46
       
      1
      %%var calc_rec_src
      
      2
      import string
      
      3
      def is_digit(i):
      
      4
          return i in list(string.digits)
      
      5
      ​
      
      6
      def parse_num(s,i):
      
      7
          while s[i:] and is_digit(s[i]):
      
      8
              i = i +1
      
      9
          return i
      
      10
      ​
      
      11
      def parse_paren(s, i):
      
      12
          assert s[i] == '('
      
      13
          i = parse_expr(s, i+1)
      
      14
          if s[i:] == '':
      
      15
              raise Exception(s, i)
      
      16
          assert s[i] == ')'
      
      17
          return i+1
      
      18
      ​
      
      19
      ​
      
      20
      def parse_expr(s, i = 0):
      
      21
          expr = []
      
      22
          is_op = True
      
      23
          while s[i:] != '':
      
      24
              c = s[i]
      
      25
              if c in list(string.digits):
      
      26
                  if not is_op: raise Exception(s,i)
      
      27
                  i = parse_num(s,i)
      
      28
                  is_op = False
      
      29
              elif c in ['+', '-', '*', '/']:
      
      30
                  if is_op: raise Exception(s,i)
      
      31
                  is_op = True
      
      32
                  i = i + 1
      
      33
              elif c == '(':
      
      34
                  if not is_op: raise Exception(s,i)
      
      35
                  i = parse_paren(s, i)
      
      36
                  is_op = False
      
      37
              elif c == ')':
      
      38
                  break
      
      39
              else:
      
      40
                  raise Exception(s,i)
      
      41
          if is_op:
      
      42
              raise Exception(s,i)
      
      43
          return i
      
      44
      ​
      
      45
      def main(arg):
      
      46
          parse_expr(arg)
      
      executed in 36ms, finished 06:14:32 2019-08-15
      . . .
      In [325]:
      xxxxxxxxxx
      
      1
       
      1
      calc_rec_grammar = accio_grammar('cal.py', VARS['calc_rec_src'], calc_samples)
      
      executed in 2.89s, finished 06:14:35 2019-08-15
      . . .
      In [326]:
      xxxxxxxxxx
      
      1
       
      1
      calc_rec_grammar
      
      executed in 12ms, finished 06:14:35 2019-08-15
      Out[326]:
      {'<START>': ['<parse_expr-0-c>'],
       '<parse_expr-0-c>': ['<parse_expr-1>', '<parse_expr-2-s><parse_expr-1>'],
       '<parse_expr-1>': ['(<parse_expr-0-c>)', '<parse_num-1-s>'],
       '<parse_expr-2-s>': ['<parse_expr-1><parse_expr>',
        '<parse_expr-1><parse_expr><parse_expr-2-s>'],
       '<parse_num-1-s>': ['<is_digit-0-c>', '<is_digit-0-c><parse_num-1-s>'],
       '<is_digit-0-c>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
       '<parse_expr>': ['*', '+', '-', '/']}
      
      . . .
      1
       
      1
      ## Parsing with Parser Combinators
      

      2.5  Parsing with Parser Combinators¶

      1
       
      1
      ### Helper
      

      2.5.1  Helper¶

      In [352]:
      xxxxxxxxxx
      
      290
       
      1
      %%var myparsec_src
      
      2
      # From https://github.com/xmonader/pyparsec
      
      3
      from functools import reduce
      
      4
      import string 
      
      5
      flatten = lambda l: [item for sublist in l for item in (sublist if isinstance(sublist, list) else [sublist] )]
      
      6
      ​
      
      7
      class Maybe:
      
      8
          pass
      
      9
      ​
      
      10
      class Just(Maybe):
      
      11
          def __init__(self, val):
      
      12
              self.val = val
      
      13
      ​
      
      14
          def __str__(self):
      
      15
              return "<Just %s>"%str(self.val)
      
      16
          
      
      17
      class Nothing(Maybe):
      
      18
          _instance = None
      
      19
          def __new__(class_, *args, **kwargs):
      
      20
              if not isinstance(class_._instance, class_):
      
      21
                  class_._instance = object.__new__(class_, *args, **kwargs)
      
      22
              return class_._instance
      
      23
      ​
      
      24
          def __str__(self):
      
      25
              return "<Nothing>"
      
      26
      ​
      
      27
      class Either:
      
      28
          pass
      
      29
      ​
      
      30
      class Left:
      
      31
          def __init__(self, errmsg):
      
      32
              self.errmsg = errmsg
      
      33
      ​
      
      34
          def __str__(self):
      
      35
              return "(Left %s)"%self.errmsg
      
      36
      ​
      
      37
          __repr__ = __str__
      
      38
          def map(self, f):
      
      39
              return self 
      
      40
      ​
      
      41
      class Right:
      
      42
          def __init__(self, val):
      
      43
              self.val = val
      
      44
      ​
      
      45
          def unwrap(self):
      
      46
              return self.val
      
      47
      ​
      
      48
          @property
      
      49
          def val0(self):
      
      50
              if isinstance(self.val[0], list):
      
      51
                  return flatten(self.val[0])
      
      52
              else:
      
      53
                  return [self.val[0]]
      
      54
      ​
      
      55
          def __str__(self):
      
      56
              return "(Right %s)"% str(self.val)
      
      57
          __repr__ = __str__
      
      58
      ​
      
      59
          def map(self, f):
      
      60
              return Right( (f(self.val0), self.val[1])) 
      
      61
      ​
      
      62
      ​
      
      63
      class Parser:
      
      64
          def __init__(self, f, tag=''):
      
      65
              self.f = f
      
      66
              self._suppressed = False
      
      67
              self.tag = tag
      
      68
      ​
      
      69
          def parse(self, *args, **kwargs):
      
      70
              return self.f(*args, **kwargs)
      
      71
      ​
      
      72
          __call__ = parse
      
      73
          
      
      74
          def __rshift__(self, rparser):
      
      75
              return and_then(self, rparser)
      
      76
      ​
      
      77
          def __lshift__(self, rparser):
      
      78
              return and_then(self, rparser)
      
      79
          
      
      80
          def __or__(self, rparser):
      
      81
              return or_else(self, rparser)
      
      82
      ​
      
      83
          def map(self, transformer):
      
      84
              return Parser(lambda *args, **kwargs: self.f(*args, **kwargs).map(transformer), self.tag)
      
      85
      ​
      
      86
          def __mul__(self, times):
      
      87
              return n(self, times) 
      
      88
      ​
      
      89
          set_action = map
      
      90
      ​
      
      91
          def suppress(self):
      
      92
              self._suppressed = True 
      
      93
              return self
      
      94
      ​
      
      95
      def pure(x):
      
      96
          def curried(s):
      
      97
              return Right((x, s))
      
      98
          return Parser(curried, 'pure')
      
      99
      ​
      
      100
      def ap(p1, p2):
      
      101
          def curried(s):
      
      102
              res = p2(s)
      
      103
              return p1(*res.val[0])
      
      104
          return curried
      
      105
      ​
      
      106
      def compose(p1, p2):
      
      107
          def newf(*args, **kwargs):
      
      108
              return p2(p1(*args, **kwargs))
      
      109
          return newf
      
      110
      ​
      
      111
      def run_parser(p, inp):
      
      112
          return p(inp)
      
      113
      ​
      
      114
      def _isokval(v):
      
      115
          if isinstance(v, str) and not v.strip():
      
      116
              return False
      
      117
          if isinstance(v, list) and v and v[0] == "":
      
      118
              return False
      
      119
          return True
      
      120
      ​
      
      121
      def and_then(p1, p2):
      
      122
          def curried(s):
      
      123
              res1 = p1(s)
      
      124
              if isinstance(res1, Left):
      
      125
                  return res1
      
      126
              else:
      
      127
                  res2 = p2(res1.val[1]) # parse remaining chars.
      
      128
                  if isinstance(res2, Right):
      
      129
                      v1 = res1.val0
      
      130
                      v2 = res2.val0
      
      131
                      vs = []
      
      132
                      if not p1._suppressed and _isokval(v1):
      
      133
                          vs += v1 
      
      134
                      if not p2._suppressed and _isokval(v2):
      
      135
                          vs += v2
      
      136
      ​
      
      137
                      return Right( (vs, res2.val[1])) 
      
      138
                  return res2
      
      139
          return Parser(curried, 'and_then')
      
      140
      ​
      
      141
      def n(parser, count):
      
      142
          def curried(s):
      
      143
              fullparsed = ""
      
      144
              for i in range(count):
      
      145
                  res = parser(s)
      
      146
                  if isinstance(res, Left):
      
      147
                      return res
      
      148
                  else:
      
      149
                      parsed, remaining = res.unwrap()
      
      150
                      s = remaining
      
      151
                      fullparsed += parsed
      
      152
              return Right((fullparsed, s))
      
      153
          return Parser(curried, 'n')
      
      154
      ​
      
      155
      def or_else(p1, p2):
      
      156
          def curried(s):
      
      157
              res = p1(s)
      
      158
              if isinstance(res, Right):
      
      159
                  return res
      
      160
              else:
      
      161
                  res = p2(s)
      
      162
                  if isinstance(res, Right):
      
      163
                      return res
      
      164
                  else:
      
      165
                      return Left("Failed at both") 
      
      166
          return Parser(curried, 'or_else')
      
      167
      ​
      
      168
      def char(c):
      
      169
          def curried(s):
      
      170
              if not s:
      
      171
                  msg = "S is empty"
      
      172
                  return Left(msg)
      
      173
              else:
      
      174
                  if s[0] == c:
      
      175
                      return Right((c, s[1:]) )
      
      176
                  else:
      
      177
                      return Left("Expecting '%s' and found '%s'"%(c, s[0]))
      
      178
          return Parser(curried, 'char')
      
      179
      ​
      
      180
      foldl = reduce
      
      181
      def choice(parsers):
      
      182
          return foldl(or_else, parsers)
      
      183
      ​
      
      184
      def any_of(chars):
      
      185
          return choice(list(map(char, chars)))
      
      186
      ​
      
      187
      def parse_string(s):
      
      188
          return foldl(and_then, list(map(char, list(s)))).map(lambda l: "".join(l))
      
      189
      ​
      
      190
      def until_seq(seq):
      
      191
          def curried(s):
      
      192
              if not s:
      
      193
                  msg = "S is empty"
      
      194
                  return Left(msg)
      
      195
              else:
      
      196
                  if seq == s[:len(seq)]:
      
      197
                      return Right(("", s))
      
      198
                  else:
      
      199
                      return Left("Expecting '%s' and found '%s'"%(seq, s[:len(seq)]))
      
      200
          return Parser(curried, 'until_seq')
      
      201
      ​
      
      202
      def until(p):
      
      203
          def curried(s):
      
      204
              res = p(s)
      
      205
              if isinstance(res, Left):
      
      206
                  return res
      
      207
              else:
      
      208
                  return Right(("", s))
      
      209
          return Parser(curried, 'until')
      
      210
      ​
      
      211
      chars = parse_string
      
      212
      ​
      
      213
      def parse_zero_or_more(parser, inp): #zero or more
      
      214
          res = parser(inp)
      
      215
          if isinstance(res, Left):
      
      216
              return "", inp
      
      217
          else:
      
      218
              firstval, restinpafterfirst = res.val
      
      219
              subseqvals, remaining = parse_zero_or_more(parser, restinpafterfirst)
      
      220
              values = firstval
      
      221
              if subseqvals:
      
      222
                  if isinstance(firstval, str):
      
      223
                      values = firstval+subseqvals
      
      224
                  elif isinstance(firstval, list):
      
      225
                      values = firstval+ ([subseqvals] if isinstance(subseqvals, str) else subseqvals)
      
      226
              return values, remaining
      
      227
      ​
      
      228
      def many(parser):
      
      229
          def curried(s):
      
      230
              return Right(parse_zero_or_more(parser,s))
      
      231
          return Parser(curried, 'many')
      
      232
      ​
      
      233
      ​
      
      234
      def many1(parser):
      
      235
          def curried(s):
      
      236
              res = run_parser(parser, s)
      
      237
              if isinstance(res, Left):
      
      238
                  return res
      
      239
              else:
      
      240
                  return run_parser(many(parser), s)
      
      241
          return Parser(curried, 'many1')
      
      242
      ​
      
      243
      ​
      
      244
      def optionally(parser):
      
      245
          noneparser = Parser(lambda x: Right( (Nothing(), "")))
      
      246
          return or_else(parser, noneparser)
      
      247
      ​
      
      248
      def sep_by1(sep, parser):
      
      249
          sep_then_parser = sep >> parser
      
      250
          return parser >> many(sep_then_parser)
      
      251
      ​
      
      252
      def sep_by(sep, parser):
      
      253
          return (sep_by1(sep, parser) | Parser(lambda x: Right( ([], "")), 'sep_by'))
      
      254
      ​
      
      255
      def forward(parsergeneratorfn):
      
      256
          def curried(s):
      
      257
              return parsergeneratorfn()(s)
      
      258
          return curried
      
      259
      ​
      
      260
      letter = any_of(string.ascii_letters)
      
      261
      letter.tag = 'letter'
      
      262
      lletter = any_of(string.ascii_lowercase)
      
      263
      lletter.tag = 'lletter'
      
      264
      uletter = any_of(string.ascii_uppercase)
      
      265
      uletter.tag = 'uletter'
      
      266
      digit = any_of(string.digits)
      
      267
      digit.tag = 'digit'
      
      268
      digits = many1(digit)
      
      269
      digits.tag = 'digits'
      
      270
      whitespace = any_of(string.whitespace)
      
      271
      whitespace.tag = 'whitespace'
      
      272
      ws = whitespace.suppress()
      
      273
      ws.tag = 'ws'
      
      274
      letters = many1(letter)
      
      275
      letters.tag = 'letters'
      
      276
      word = letters
      
      277
      word.tag = 'word'
      
      278
      alphanumword = many(letter >> (letters|digits))
      
      279
      alphanumword.tag = 'alphanumword'
      
      280
      num_as_int = digits.map(lambda l: int("".join(l)))
      
      281
      num_as_int.tag = 'num_as_int'
      
      282
      between = lambda p1, p2 , p3 : p1 >> p2 >> p3
      
      283
      surrounded_by = lambda surparser, contentparser: surparser >> contentparser >> surparser
      
      284
      quotedword = surrounded_by( (char('"')|char("'")).suppress() , word)
      
      285
      quotedword.tag = 'quotedword'
      
      286
      option = optionally
      
      287
      option.tag = 'optionally'
      
      288
      ​
      
      289
      # commasepareted_p = sep_by(char(",").suppress(), many1(word) | many1(digit) | many1(quotedword))
      
      290
      commaseparated_of = lambda p: sep_by(char(",").suppress(), many(p))
      
      executed in 7ms, finished 06:26:56 2019-08-15
      . . .
      In [353]:
      xxxxxxxxxx
      
      3
       
      1
      with open('build/myparsec.py', 'w+') as f:
      
      2
          src = rewrite(VARS['myparsec_src'], original='myparsec.py')
      
      3
          print(src, file=f)
      
      executed in 36ms, finished 06:26:57 2019-08-15
      . . .
      1
       
      1
      ### Subject - assignment
      

      2.5.2  Subject - assignment¶

      In [354]:
      xxxxxxxxxx
      
      17
       
      1
      %%var parsec_src
      
      2
      import string
      
      3
      import json
      
      4
      import sys
      
      5
      import myparsec as pyparsec
      
      6
      ​
      
      7
      alphap = pyparsec.char('a')
      
      8
      alphap.tag = 'alphap'
      
      9
      eqp = pyparsec.char('=')
      
      10
      eqp.tag = 'eqp'
      
      11
      digitp = pyparsec.digits
      
      12
      digitp.tag = 'digitp'
      
      13
      abcparser = alphap >> eqp >> digitp
      
      14
      abcparser.tag = 'abcparser'
      
      15
      ​
      
      16
      def main(arg):
      
      17
          abcparser.parse(arg)
      
      executed in 6ms, finished 06:26:59 2019-08-15
      . . .
      1
       
      1
      ### Sample
      

      2.5.3  Sample¶

      In [355]:
      xxxxxxxxxx
      
      3
       
      1
      parsec_samples = [
      
      2
      'a=0'
      
      3
      ]
      
      executed in 6ms, finished 06:27:00 2019-08-15
      . . .
      1
       
      1
      ### Recovering the parse tree
      

      2.5.4  Recovering the parse tree¶

      In [356]:
      xxxxxxxxxx
      
      22
       
      1
      def accio_tree(fname, src, samples, restrict=True):
      
      2
          program_src[fname] = src
      
      3
          with open('subjects/%s' % fname, 'w+') as f:
      
      4
              print(src, file=f)
      
      5
          resrc = rewrite(src, fname)
      
      6
          if restrict:
      
      7
              resrc = resrc.replace('restrict = {\'files\': [sys.argv[0]]}', 'restrict = {}')
      
      8
          with open('build/%s' % fname, 'w+') as f:
      
      9
              print(resrc, file=f)
      
      10
          os.makedirs('samples/%s' % fname, exist_ok=True)
      
      11
          sample_files = {("samples/%s/%d.csv"%(fname,i)):s for i,s in enumerate(samples)}
      
      12
          for k in sample_files:
      
      13
              with open(k, 'w+') as f:
      
      14
                  print(sample_files[k], file=f)
      
      15
      ​
      
      16
          call_trace = []
      
      17
          for i in sample_files:
      
      18
              my_tree = do(["python", "./build/%s" % fname, i]).stdout
      
      19
              call_trace.append(json.loads(my_tree)[0])
      
      20
          mined_tree = miner(call_trace)
      
      21
          generalized_tree = generalize_iter(mined_tree)
      
      22
          return generalized_tree
      
      executed in 8ms, finished 06:27:02 2019-08-15
      . . .
      In [357]:
      xxxxxxxxxx
      
      1
       
      1
      parsec_trees = accio_tree('parsec.py', VARS['parsec_src'], parsec_samples)
      
      executed in 82ms, finished 06:27:03 2019-08-15
      . . .
      In [358]:
      xxxxxxxxxx
      
      1
       
      1
      zoom(display_tree(parsec_trees[0]['tree'], extract_node=extract_node_o))
      
      executed in 88ms, finished 06:27:04 2019-08-15
      Out[358]:
      . . .
      1
       
      1
      ## Parsing with PEG Parser
      

      2.6  Parsing with PEG Parser¶

      In [334]:
      xxxxxxxxxx
      
      57
       
      1
      %%var peg_src
      
      2
      import re
      
      3
      RE_NONTERMINAL = re.compile(r'(<[^<> ]*>)')
      
      4
      ​
      
      5
      def canonical(grammar, letters=False):
      
      6
          def split(expansion):
      
      7
              if isinstance(expansion, tuple): expansion = expansion[0]
      
      8
              return [token for token in re.split(RE_NONTERMINAL, expansion) if token]
      
      9
          def tokenize(word): return list(word) if letters else [word]
      
      10
          def canonical_expr(expression):
      
      11
              return [token for word in split(expression)
      
      12
                  for token in ([word] if word in grammar else tokenize(word))]
      
      13
          return {k: [canonical_expr(expression) for expression in alternatives]
      
      14
              for k, alternatives in grammar.items()}
      
      15
      ​
      
      16
      def crange(character_start, character_end):
      
      17
          return [chr(i) for i in range(ord(character_start), ord(character_end) + 1)]
      
      18
      ​
      
      19
      def unify_key(grammar, key, text, at=0):
      
      20
          if key not in grammar:
      
      21
              if text[at:].startswith(key):
      
      22
                  return at + len(key), (key, [])
      
      23
              else:
      
      24
                  return at, None
      
      25
          for rule in grammar[key]:
      
      26
              to, res = unify_rule(grammar, rule, text, at)
      
      27
              if res:
      
      28
                  return (to, (key, res))
      
      29
          return 0, None
      
      30
      ​
      
      31
      def unify_rule(grammar, rule, text, at):
      
      32
          results = []
      
      33
          for token in rule:
      
      34
              at, res = unify_key(grammar, token, text, at)
      
      35
              if res is None:
      
      36
                  return at, None
      
      37
              results.append(res)
      
      38
          return at, results
      
      39
      ​
      
      40
      import string
      
      41
      VAR_GRAMMAR = {
      
      42
          '<start>': ['<assignment>'],
      
      43
          '<assignment>': ['<identifier>=<expr>'],
      
      44
          '<identifier>': ['<word>'],
      
      45
          '<word>': ['<alpha><word>', '<alpha>'],
      
      46
          '<alpha>': list(string.ascii_letters),
      
      47
          '<expr>': ['<term>+<expr>', '<term>-<expr>', '<term>'],
      
      48
          '<term>': ['<factor>*<term>', '<factor>/<term>', '<factor>'],
      
      49
          '<factor>':
      
      50
          ['+<factor>', '-<factor>', '(<expr>)', '<identifier>', '<number>'],
      
      51
          '<number>': ['<integer>.<integer>', '<integer>'],
      
      52
          '<integer>': ['<digit><integer>', '<digit>'],
      
      53
          '<digit>': crange('0', '9')
      
      54
      }
      
      55
      def main(arg):
      
      56
          C_VG = canonical(VAR_GRAMMAR)
      
      57
          unify_key(C_VG, '<start>', arg)
      
      executed in 11ms, finished 06:14:35 2019-08-15
      . . .
      1
       
      1
      ### PEG samples
      

      2.6.1  PEG samples¶

      In [335]:
      xxxxxxxxxx
      
      3
       
      1
      peg_samples = [
      
      2
          'a=0',
      
      3
      ]
      
      executed in 6ms, finished 06:14:35 2019-08-15
      . . .
      In [336]:
      xxxxxxxxxx
      
      1
       
      1
      peg_trees = accio_tree('peg.py', VARS['peg_src'], peg_samples, False)
      
      executed in 302ms, finished 06:14:35 2019-08-15
      . . .
      In [337]:
      xxxxxxxxxx
      
      1
       
      1
      zoom(display_tree(peg_trees[0]['tree'], extract_node=extract_node_o))
      
      executed in 98ms, finished 06:14:35 2019-08-15
      Out[337]:
      . . .
      CloseExpandOpen in PagerClose